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A model  of  language  syntax  acquisition  is  formulated  as  an 
inference  problem:  to  guess  the  wiring  diagram  of  an  unreliable 

automaton.  A constructive  method  is  developed  which  solves  the 
grammatical  inference  problem  via  an  abductory  inductive  process 
applied  to  the  sample  strings  generated  by  the  stochastic 
automaton  whose  internal  wiring  diagram  is  unavailable  for 
inspection.  The  right  invariant  equivalence  classes  which 
correspond  to  the  states  of  the  sought-for  automaton  are 
established  by  the  training  sequence  and  a teacher.  The 
structural  description  of  strings  is  found  directly  without 
a priori  assumptions  on  the  number  of  states  (or  lengths  of 
strings).  Several  examples  are  used  to  illustrate  the  bohnvi  ir 
of  various  algorithms  which  were  developed  to  carry  out  i 1 n 
synthesis,  among  these  is  a fragment  of  English  grammar.  The 
same  method  for  solution  of  the  above  problem  can  be  used  to 
establish  word  classes.  The  dictionary  is  partitioned  into  the 
classes  induced  by  an  equivalence  relation  defined  by  grammatical 
substitutability  of  words.  All  the  algorithms  are  formally 
described  and  implemented  on  a digital  computer  in  A Programming 
Language  (APL).  After  some  finite  time,  the  algorithm  establishes 
the  minimal  state,  completely  specified,  deterministic,  performance 
model  automaton.  The  original  grammar  is  obtained  together  with 
the  frequency-defined  probabilities  which  approximate  the 


probabilities  imposed  on  the  rules  of  the  grammar.  That  is,  the 
states,  wires,  and  approximate  transition  probabilities  of  the 


original  automaton  are  obtained.  These  "abduction  machines"  ara 
analyzed  for  convergence  rates  and  robustness  (stability  with 
respect  to  an  imperfect  teacher). 
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Chapter 1 

Introduction 

Th Is  pa{  ■■  : als  wit!  n 1(  f Lar  ;uag  syntax  iquisiti  n. 

■ • ' ass  im<  1 that  the  artificia  Languagi  has  a finit(  lescripti  • 
which  we  hope  to  discover  on  the  basis  of  a finite  sample  of 
sentences.  Specifically  excluded  is  the  simple  formulat  ' n f the 
observations  actua  l ;/  :•  , a ! * h urh  a i-  4 ' • is  highly 

desirable.  In  the  terminology  of  1< arning  n lels,  an  insightful 
model  is  to  be  preferred  over  the  rote  learning  exemplified  in  a 
list . 

Proposal  for  the  study  of  this  problem  was  posed  in  the 
paper  "Pattern  Conception"  by  Miller  and  Chomsky  [1957];  this 
paper  elaborated  on  the  virtues  of  finite  state  automaton  models. 

An  early  description  of  a machine  to  carry  out  grammar  discovery 
is  given  by  Solomonoff  in  [1957]-  Variations  of  this  pr  i Len 
are  found  in  artificial  intelligence,  human  cognitive  studies, 
pattern  recognition,  linguistics,  and  in  systems  theory  under 
labels  such  as  Inductive  inference,  automat  >n  identificati  m and 
grammatical  inference.  The  fine  exposition  by  Fu  [197^]  in  a 
chapter  entitled  "Grammatical  Inference  for  Syntactic  Pattern 
Recognition"  surveys  many  approar  ss  and  also  contains  h 9 
re;'“rences . Additional  references  can  be  found  in  Trakhtenbrot 
and  Barzdin  [19731-  More  recent  references  are  Adleman  and 
Blum  [1975]  which  deals  with  degrees  of  unsolvability  of  inductive 
inference  problems  and  Angluln  [1976]  which  explores  complexity 
of  the  inference  of  finit<  state  grammars  from  a finite  set  of 
positive  and  negative  sample  strings. 
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Th  ving  it  . stj  ' n is  n ' /ated  toward  the  presenta- 

■ : n o natural  models.  The  investigation  follows  the  paths 
initj  • l by  Jrenander  • in  his  abductory  Induction  models . 
tf  w<  restrict  consideration  to  regular  grammars,  the  number  of 
: .• It  1 • grammars  is  overwhelmingly  large  for  ever:  small  size 

alphabets  ■ •.  : i lest  numbers  f variables . Th  is  Implies  f ! 1 1 1 
m dels  f r analogy  to  real  world  phenomena  which  exhibit  language 
acquisition  ability  cannot  be  based  on  enumeratlve  inference  or 
fii  Lt<  search  • • ihniques;  for  this  reason,  we  also  seek  an 
alternative  to  direct  implementations  of  maximum  likelihood  solu- 
tions which  select  one  grammar  over  another  on  probabilistic 
:riteria  by  the  solution  of  large  scale  linear  systems. 

In  particular,  algorithms  are  presents  J whi  :h  sarry 
rrammar  discovery  by  the  construction  of  ...e  (right  invariant) 
equivalence  classes  induced  by  the  finite  state  automaton.  The 
classes  are  established  by  means  of  a training  s>  quence  and  a 
teacher.  The  number  of  possible  partitions  induced  by  an  equi- 
valence relation  defined  on  a finite  set  of  elements  is  known  to 
be  given  by  sums  of  Stirling  numbers  of  the  second  kind.  To 
reduce  the  combinatorial  complexity  of  the  problem,  another  equi- 
. lence  relation  is  defined  c->  the  word  dictionary  in  a natural  way 
by  the  grammatical  substitutability  of  words.  It  is  used  to 
partition  this  dictionary  into  grammatical  equivalence  classes. 

The  prototype  of  each  word  class,  i.hat  is,  the  representor  of 
each  of  the  established  classes,  is  then  used  to  carry  out  the 
synthesis  of  the  minimal  state  automaton.  The  algorithms  are 
imbedded  in  a statistical  environment;  they  are  studied  experi- 
mentally and  theoretically.  Special  attention  is  focussed  on  the 
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flexibility  of  the  algorithms  to  accommodate  non-systematic 
errors  (e.g.  a teacher  which  occasionally  makes  errors). 

The  model  consists  of  two  components:  the  first  component 

is  a teacher  in  a dual  role.  The  teacher  acts  as  a generative 
grammar  which  produces  strings  in  a syntax-controlled  probability 
language  (Grenander  [1967]).  The  random  structure  of  the  language 
is  induced  by  imposing  probabilities  on  the  rules  of  the  grammar. 
In  this  way,  the  production  of  somo  strings  can  be  inhibited  while 
the  production  of  others  can  be  made  more  likely.  This  adds  to 
the  teacher's  grammar  a facet  of  linguistic  performance  although 
it  does  not  increase  the  generative  power  of  the  grammar.  The 
teacher  also  serves  as  an  acceptor  of  strings;  it  can  judge 
whether  or  not  a sentence  presented  to  it  by  the  learner  is  within 
its  competence. 

The  second  component,  called  the  learner,  consists  of  two 
principal  procedures  which  carry  out  the  construction  of  a copy 
of  the  teacher.  The  first  procedure  or  phase  is  devoted  to  the 
classification  of  words  into  equivalence  classes  on  the  basis  of 
grammatical  substitutability;  this  procedure  is  in  principle 
infinitary.  These  classes  are  familiar  in  structural  linguistics 
where  they  are  called  families  (Kulagina  [1958])  or  categories 
(Miller  and  Chomsky  [1963])  although  we  do  not  use  these  classes 
in  quite  the  same  way.  They  are  introduced  here  to  reduce  the 
combinatorial  complexity  of  the  constructions. 

The  first  phase  begi  s with  the  assumption  that  all  words 
are  in  one  equivalence  class:  this  is  the  tabula  rasa  with  which 


the  arner  begins.  The  discovery  of  the  classes  is  carried  out 
by  the  resolution  of  the  dictionary  into  the  classes  which  form 
juired  partitl  >n.  The  teacher  randomly  generates  strings 
whlci  : resented  to  the  learner  as  a training  sequence.  For 

strii  ■ in  the  training  sequence,  a word  is  selected  according 
ight  probabilistic  strategy;  this  might  be  thought  of  as 
attention  function.  This  string  is  used  to  either  strengthen 
Learner’s  belief  about  the  selected  word's  membership  In  its 

• • * ,-rarar.a - 1 al  class  or  it  is  used  to  introduce  a new 

• :•  * : <•  entertained  about  the  relation  of  this  word  to 

• ught  for  partition  of  the  dictionary.  The  term  abduction, 
introduced  by  c.S.  ieirce  [1931]  to  describe  the  starting  of  a 
hypothesis,  is  applied  to  describe  this  process  which  either  changes 
the  class  membership  of  the  selected  word  or  forms  a new  class  with 
this  selected  word.  All  classes  formed  are  characterized  by  a 
fixed  representative  word  called  a prototype . 

The  second  phase  carries  out  the  discovery  of  the  syntactic 
variables  and  the  rewrite  rules  which  govern  these  variables. 
Initially  in  this  phase,  the  learner  has  a tabula  rasa  with 
respect  to  variables;  the  discovery  of  variables  proceeds  in  a 
manner  analogous  to  the  phase  one  process.  Each  string  in  the 
training  sequence  is  analyzed  to  determine 

initial  string  equivalence  classes,  i.e.  the  syntactic  variables. 
Each  string  can  be  decoded  with  respect  to  the  word  class  partition 
determined  in  phase  one  as  described  above.  This  indexing  scheme 
is  used  to  implement  efficiently  the  process  which  determines  the 


5 


partition  of  initial  string  into  the  sought  for  syntactic 
variables.  A subsequent,  encoding  of  initial  strings  is  a 
representation  of  the  rewrite  rules.  A simple  tally  scheme 
computes  the  experimental  frequency-defined  probab  i 1 i t.  ies  . 

The  model  described  above  is  a blueprint  for  a language 
discovery  machine.  Such  a machine  has  been  implemented  in  A 
Programming  Language  (APL):  this  machine  has  been  tested  on 

a fragment  English  grammar  and  on  several  formal  grammars. 

The  fragment  English  grammar  consists  of  87  rules  on  52  words 
in  23  classes;  the  rules  govern  the  18  syntactic  variables  in 
this  syntax-controlled  probability  grammar.  The  teacher- learner 
interaction  is  portrayed  with  no  explicit  semantics  and  no 
environment  (context).  That  is,  semantics  and  pragmatism  are 
contained  in  neither  the  teacher  or  learner  nor  the  training 
sequence.  The  language  strings  appear  to  have  a semantic  aspect: 
this  is  built  into  the  syntactic  rewrite  rules.  The  expected 
sentence  length  (computed  from  the  mathematical  model)  is  7.05 
words.  In  a typical  experiment,  115  sentences  were  heard  by  the 
learner  to  determine  20  of  the  23  classes  and  to  correctly 
classify  48  of  the  words  in  the  dictionary.  After  19  sentences 
wei"  analyzed,  all  of  the  18  variables  were  discovered.  Th>* 
graph  in  figure  4.1  illustrates  the  learning 

characteristics  exemplified  by  the  word  class  discovery 
procedure.  More  complete  experimental  results  and  a mathematical 
model  for  the  word  class  learning  appear  in  later  sections. 


r, 


P-’pl  1 narle.-. 

Th<*  foil  iwing  sections,  which  present  no  new  results, 
contain  the  definitions  and  results  from  formal  language  theory 
and  automaton  theory  which  serve  as  research  background  for 
syntactic  abduction  of  linear  strings  presented  in  the  later 
■ : rs.  The  exposition  follows  Hopcroft  and  Ullman  [19693* 
The  exposition  of  the  syntax-controlled  probabilities  follows 
Grenander  [1967 ] • 
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Phrase  Structure  Languages 

1.  Let  V.J,  denote  a finite  set  of  symbols  called  terminals  or 

words;  let  V*  denote  the  set  of  all  finite  length  trings  f 

* + 

these  words;  and  let  denote  V^UtUULL}  where  the  empty 

string  of  length  zero  Is  denoted  hy  MULL.  A language  is  any 

# 

element  of  the  powerset  of  V„,.  Those  (possibly  infinite) 

1 

* 

subsets  of  V^,  which  have  finite  generating  representations  are 
called  recursively  enumerable.  These  finite  generating  repre- 
sentations or  specifications  are  called  phrase  structure 
grammars  and  are  formulated  as  follows:  introduce  an  auxiliary 

finite  set  of  symbols,  denoted  by  V^,  called  non-terminals  or 
syntactic  variables,  with  a distinguished  symbol  S(C:VM); 
introduce  a finite  set  of  rewrite  rules  R which  govern  these 
variables  and  which  is  a subset  of  V*  *V*,  where  V denotes 
V,,  uvr  Then  the  phrase  structure  language  consists  of  the 
strings  which  can  be  derived  from  S (the  start  symbol)  by 
successive  application  of  the  rules.  This  is  rigorously 
described  by  the  introduction  of  a relation  from  V+  to  V#  as 
follows:  for  any  u£V+  and  vEV*,  u is  said  to  directly  derive 

v (in  the  grammar)  if  there  are  strings  i,j,x,y  £V*  such  that 
u = xiy,  v = xjy  and  (i,J)€R.  This  can  be  extended  by  sayingthat 
u derives  v in  the  grammar  if  either  u=v  or  if  there  is  a 
finite  sequence  ZQ, ,Z0 , . . . ,ZmC V* , m > 1,  such  that  u*ZQ, 
v=Ztj,  and  Z ^ directly  derives  Z.+1  for1  l = 0(l)m-l.  Then  the 
language  generated  by  thj..-  grammar  is  defined  as  the  set  of 
strings  of  words  which  can  be  derived  from  the  start  symbol  S. 





*w 
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If  S derives  u (denoted  by  S > u)  and  u contains  variables, 
th  • a Is  call<  i a sentent  lal  form.  N<  te  that  the  elements  in 
the  set  of  rewrite  rules,  for  example  denoted  by  (i,j),  are 
customarily  also  written  as  i -*■  . If  two  grammars  generate 

the  same  language,  then  they  are  said  to  be  weakly  equivalent. 

2.  Context-Free  Languages.  If  the  rewrite  rules  R are 
restricted  to  finite  subsets  of  V,.  *V*,  then  the  resultant 
grammar  is  called  a context-free  grammar. 

3.  Finite-State  Languages.  Those  subsets  of  context-free 

grammars  to  which  we  focus  our  attention  are  called  finite  state 
grammars.  The  variables  are  governed  by  rewrite  rules  of  two 
types:  continuing  1 + xj  or  terminating  i -*  x , where  i,j  f V,, 

and  xEV,p.  Denote  the  finite  set  of  rules  which  rewrite  i by 

R.  and  assume  that  the  generating  algorithm  begins  by  the 
application  of  a rule  selected  from  R^. 

The  language  generated  by  such  a grammar,  which  consists 
of  Vr;,,V,,  and  R,  is  denoted  by  L(G);  the  symbol  G denotes  the 
triple  (V^,,V^,R)  and  R denotes  the  finite  set  of  all  rules. 

The  language  L(G)  consists  of  all  those  strings  in  V,p  which  can 
be  { roduced  by  the  application  of  the  rules  in  R. 


4.  Relation  to  Finite  Automata.  The  language  generated 

by  a grammar-  is  some  set  of  springs  as  described  above.  This 
set  is  also  the  set  accepted  by  some  finite  state  automaton. 
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Th'  idep4  : fi cat i on  of  L(G)  with  its  finite  state  automaton 
iccept  r ; ■■  is  •'  . ws : th<  1 ' : • lr  >f  which 

• • ■ in  ■ , : rr  sj  1 I t t 1 < stat<  s when ■ th«  state 

■ rresj  ■ Is  1 th<  listinguis  • I < rial  e S Intr  luc<  1 al  ve  and. 
It  additi  n , w intr  iuc<  a final  stat<  wl  I :h  Is  th<  "target” 
state  for  * ho;--  variables  which  ar-  governed  by  te  rminating 
rewrite  rules. 


5.  Syntax-  ’'ntr  lied  Probabilities.  F<  r each 

P . = (r-  , r.  , ...,r-  },  where  r denotes  a rule  and  n.  is 

J1  Jn5  1 

the  number  of  rules  rewriting  i introduce  a probability  distribu- 
tion over  F. . so  that 

Mi 

Z P(r,  ) = 1, 
k=i  ’ k 


wtr  re  i = l , 2 , . . . , n . 

6.  Markov  Chain.  Consider  the  application  of  the  grammar  G 
together  with  the  probability  distribution.  In  terms  of  the 
automaton  description,  the  probability  that  the  machine  will 
be  at  state  l at  time  t+1  given  that  it  is  at  state  k at  time 
t is  specified.  A system  which  evolves  thi  ough  a finite  number 
of  states  (nv+l)  with  a specified  conditional  probability  of 
transition  between  two  states  for  a given  state  at  time  t which 
is  independent  of  t is  called  a finite  homogeneous  Markov  chain 
(Kemeny  and  Snell  [I960]).  The  familiar  state  diagram  for 
finite  automata  (with  labeled  arcs  which  indicate  letters  in 
and  the  probability)  has  a description  in  terms  of  two  matrices: 
a matrix  of  probabilities  and  a matrix  which  prescribes  the 
letters  which  correspond  to  transitions  between  states. 
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Chapter  2 

The  Word  Class  Discovery  Algorithm 

1.  Problem  Statement 

The  grammar  G defined  on  the  word  dictionary  induces  an 
equivalence  relation  on  V^  in  a natural  way.  This  equivalence 
relation  is  based  on  grammatical  substitutability.  Introduce 

a Boolean  function  g,  the  grammaticality  function  defined  on  the 

* 

set  of  all  strings  so  that 

[ 1 if  u in  L(G) 
g(u)  = < 

I 0 otherwise . 

T: the  word  equivalence  relation  (EQUIV)  induced  on  is 
precisely  defined  by: 

for  x,y  in  the  finite  set  V^. , x EQUIV  y if 

# 

g(uxv)  = g(uyv)  for  all  u,v  in  V^. 

Define  a partition  of  to  be  a set  of  disjoint  non-empty  sub- 
sets of  V,p  whose  union  Is  V^.  We  have  the  following 
Lemma.  The  relation  EQUIV  on  V^  induces  a partition  of  V^  into 
classes  CL[J],  J=1 , 2 , . . . , NC . For  each  J,  and  any  IAT 

1.  x,y  In  CL[J]  implies  TRUE  = x EQUIV  y and 

2.  x in  CL[ I ] , y in  CL[J]  implies  FALSE  = x EQUIV  y. 


I 

I 

I 


I 


I | 

i 


Denote  the  empty  set  by  NULL.  Denote  the  set  difference  of  A 
and  B by  A CMPL  B (also  called  the  complement  of  B in  A).  Then 
the  proof  proceeds  as  follows: 
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[1]  NC  - 0 
A *-  W 

L 3 ] Ll : ► L2  IF  A = NULL 

[A]  NC  ■*-  NC  + 1 

[5]  Pick  x £ A 

( 

[6]  CL[  NC  ] «-  (y  € A : y EQUIV  x} 

[7]  A •*-  A CM  PL  CL[NC] 

[8]  - Ll 

[9]  L2:  -*  L3  IF  NC^O 

[10]  NC  <-  1 

Lll  1 - L2 

[12]  L3:  + 0 

The  problem  is  to  carry  out  step  [6]-  The  formation  of  the  set 

# 

ty£A:  y EQUIV  xj  must  be  carried  out  over  VT  which  is 
(denumerably ) infinite. 

2.  We  now  develop  the  algorithm  to  approximate  the  partition 
which  is  induced  by  the  infinitary  equivalence  relation  EQUIV. 

Make  the  following  observation:  for  words  x and  y,  if 

g(uxv)^g(uyv)  for  any  strings  u and  v,  then  this  is  sufficient  to 
conclude  that  words  x and  y are  not  equivalent.  However,  for 
words  x and  y and  for  some  particular  strings  u and  v,  it  is 
possible  that  g(uxv)  = g(uyv)  but  x and  y are  not  equivalent.  It 
is  for  this  reason  that  the  usual  procedures  for  partitioning  a 
set  under  an  equivalence  relation  (Knuth  [1973])  are  not 
recommended.  Procedures  based  on  aggregation  of  classes  (or 
coalescing,  or  establishing  links  between  classes)  might 

• 

■A 
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erroneously  combine  two  classes  on  the  basis  of  a single  positive 
result  of  the  test  for  simultaneous  grammatical ity  of  strings 
containing  the  two  tested  words.  The  alp-orithm  detailed  in  the 
next  section  is  based  on  the  decomposition  of  the  dictionary 
into  classes;  the  classes  are  formed  as  they  are  needed. 

3.  An  Approximating  Algorithm.  Here  is  a learning  procedure 
which  provides  an  approximating  algorithm  for  the  partition  of 
the  dictionary  into  word  equivalence  classes.  We  assume  that 
the  entire  dictionary  is  known;  the  dictionary  is  unfolded 
into  the  classes  as  described  below. 

With  the  observation  noted  in  the  previous  section,  suppose 
that  x and  y are  at  some  stage  classed  together  and  that  for 
strings  u , v, g(uxv )^g(uyv ) ; then  there  is  reason  to  reclassify 
one  of  the  words.  The  algorithm  begins  with  the  dictionary  as 
the  only  equivalence  class;  new  classes  are  formed  as  the  need 
to  create  them  occurs.  Each  class  is  characterized  by  a unique, 
fixed  representor  word  called  a prototype  - Once  a class  has 
been  established,  there  is  no  mechanism  to  remove  it  from  the 
partition. 
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I 

I 

I 

I 

I 

f 

! 


I 

I 

I 

I 


A . A w rd  " en  fron  thi  licti  * ry  Ls  ised  as  1 f\ 

of  the  (only)  word  class.  T1  is  is  the  tabula  rasa  hypothesis 
; ..  • . lictioi  i ry : that  thi  parti ti  i • ns!  ts  f n<  ass 
with  the  chosen  w<  ■ rd  as  * he  class  Drototype,  nee  d.  s • • t . , this 
..  rd  is  fix<  : r all  ti m<  as  this  ass  pr  t t y pe.  After  this 

1 1 it  ' at 3 n Ls  :arri<  i it , thi  ritl  pr  ;e<  is  acc  rding 

th<  graphical  ie script!  ns  ;iven  b<  w . i thesi 

and  in  subsr  ju>nt  chapters,  the  algorithms  will  be  described  in 
several  ways  among  which  are:  flow  diagrams,  i 1 :on1 

n s . iiagrams  with  API  is  ; as  a mi  iium  f r thi  si  juent ial , 

: " lural,  high-level  langua ge , ai  J English  Languagi  iescrip- 

tions.  Note  thal  thesi  • rithms  havi  been  implemented  •.  •,  a 
ligital  ■ iter  via  an  IBM  VS  APL  processor;  the  iependi  t :< 

■ : s "system  environment"  and  Lai  rati  inj  it-  itpul  prot 
have  been  minimized. 

LISTEN  produces  a string  in  L(G).  ATTENT1  N ■ nerates  a pointer 
NUM  to  the  string.  The  mechanism  to  generate  NUM  will  be 
described  later. 


Figure  2.1 


B.  Th<  N f V_  pr  eeds  as  fol]  ws : 

Suppo  s < that  ' t : < w i X is  ■ . ' th<  pr  t t y p<  f at 

exist!  :lass  11  th<  partiti  ■ . T1  • musl  2STAB 

: . • x : i t h ■ • x is  ' b<  : la s s I f J ii  ai  x i s t ing  : las s r , 

if  this  ' . ■ ' [ ssit  < , thei  w : ass  f the 

! art  it  L >n . 

Note  that  if  the  class  membership  of  every  word  were 
letermined,  1 1 ■ tl  partiti  n w uld  b<  kn  >wn ; that  is,  if  w< 
STAB]  at  it  a w r i x r<  ativ<  t th<  partition, 

• 1 ■ this  ESHes  BELIEF  about  the  partiti  n itself. 

ESTAI  ISHing  at  it  a w< >rd . Th< ■ w rd  x is  ;lassifi<  S 

‘ an  existing  class  CLfk]  if  the  string  generated  by  LISTEN  is 
grammatical  with  the  prototype  of  CL[k]  substituted  for  x in 
this  ring.  Th<  pr  1 types  f thi  existing  : asses  are  th< 
elements  of  PROTON.  These  are  t,<  st<  : sue  tessively  (r<  it  ive  1 
the  current  string)  beginning  with  the  prototype  of  the  current 
class  to  which  x belongs . if  th<  currenl  tlass : ficati  n f th< 
word  x is  not  correct,  then  the  current  partition  will  be 
m lifii  : as  I 1 >ws . Th<  w rd  x will  either  b<  r<  • assifi(  i 
an  existing  class  or,  if  this  is  not  possible,  the  word  x will 
n<  w :lass  if  the  partition;  thesi  1 w ;as<  s int  ri  duce 
a new  hypothesis  about  the  relation  of  the  word  x to  the  current 
partition.  (A  more  precise  statement:  the  new  hypothesis  is 

formed  about  the  partition  from  the  set  of  all  partitions  of  the 
set  V^,.  ) The  classif lcatl(  of  a word  x is  called  ADJUST;  the 
formation  of  a class  Is  called  ADD.  These  are  detailed  in  the 


section::  which  follows. 
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Modifying  Ltlon:  md  ADD.  In  a particu  a r 

■ ■ ■ • ntatioi  f th<  ;ori  tl  , : i i f words  is  a 

inear  List.  Fh<  firsl  ■ emenl  thi  ' i:  th<  j t typ< 

f th<  initia  • ras«  word  . econd  1st,  jailed 

PR'.  T will  • rd  the  pr  t types. 

s r ted  r : r , i f a w rd  x :am  1 b<  :1a:  f ' • 1 t an 

. ■ tini  r<  ativ<  1 th<  jurrent  string  , thei  a n w 

is  add<  1 t<  th  part  it  i • • Thi  w rd  x is  1 b<  th< 

: • typ<  f this  new  class.  The  word  x is  mov<  it  th<  end 

th<  . ' . ■ and  PR  T S ' . ij  lated.  As  : asses  ar<  f >rm< id , tl 
list  of  words  will  bo  subdivided  into  sublists  which  correspond 
to  the  discov  ri  i ■ ass<  f tl  ( art  iti  . Each  sublist  is 
initially  defined  by  a prototype. 

. ' . : : . ■ • ■ ■ . ■ ■ i w r d > is  t : ■ r e c 1 a s s i f I < 1 . 

Then  it.  is  moved  from  its  current  f osition  in  the  list  to  the 
• : f thi  sublisl  f the  class  to  which  x is  believed  to  b<  ng. 
[f  x is  re  slfied  t th<  ast  slass  f rm<  1,  then  x is 

: ed  fror  th<  List  and  pushed  nt  th<  end  of  th<  1 : s t ; 

this  i tailed  LAST).  Otherwise,  x is  deleted  from 

the  list  and  inserted  into  the  list  (by  a push  onto  the  sublist 
..  ' ■ nstitutes  th<  : rr<  ■ 1 tlass , r<  atlve  1 t ; le  currenl 

string).  Th<  insertion  move  is  called  PROMOTE. 

Supi  . i nut  relative  to  the  current  string,  x is  tested 
equivalent  ’ the  : rototype  of  its  current  class.  Then  within 
this  (linear)  sublist,  t:  word  x is  moved  closer  to  its  proto- 

type; if  x is  already  adjacent,  'o  the  prototype,  no  action  is 
taken.  This  type  of  move  is  called  STRENGTHEN. 
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Note  that  In  the  'ase  that  a word  is  removed  from  a class 
1 • ;ann  t ret  irn  4 4 ■ 1 s :lass.  If  a w rd  x Is  t b<  removed 

f i • 5s  wh  s<  pr  4 type  4 p,  this  m<  ans  that  f r 4 ■ 

■urrent  STRING  rewritten  as  uxv  we  have 

,-:(uxv;  ^ g(upv). 

, th<  STRI 3 wh  1 ■ '•  s paral  s • at  1 j has  irred; 

....  • : . ..  ..  peas  ’ 4 retry  th<  tlass  4 whicl  j t r ■ . 

Supi  se  the  classes  are  .assigned  indi  -es  berinninp  with  one 

whi resj  ■ Is  4 4 ■ • initial  tabu  rs  sa  ■ 1 ass . VI  r<  si  at e 

th<  ■ by:  t • x has  been  prom  4 • 1 4 a higher  index  class , 

th<-  class  index  need  nev<  ••  t ••  wi  r<  1.  In  4 m<  sense,  w :•  ar< 

bubbled  through  sublists  unidirectionally  toward  the  'orrect 
• asses.  W 4 4 1 4 ■ sub list,  the  movement  is  4 ward  4 he  :lass 
pr  totype. 

In  summary,  to  ESTABLISH  BELIEF  about  the  partition  for  a 
non-prototype  word  x relative  to  its  occurrence  in  a string  we 
either  ADD  (if  the  existing  classes  are  rejected  or  candidates 
for  the  class  to  which  x might  belong)  or  ADJUST.  In  the  case 
of  ADJUST,  we  either  strengthen  or  move  t.he  word  x via 
FEOFF TE/LAST . 

, 4 r I . ■ • 4 • • : 4 4 ■ . ■ ■ ■ ' 4 • : w : ■ : ! . 4 a to"  4 y : - • ; 4 ; < 1 

it  defines  what  is  believed  to  be  a class  of  the  partition.  In 

this  case,  a REFINEment  of  the  believed  partition  might  be 
possible.  The  procedure  4 scribed  here  is  made  necessary  by  the 
fact  that  th<*  prototype  Is  f J x.-d  for  a Hass  which  may,  for 
example,  contain  words  Incorrectly  classified.  In  particular, 


suppose  that 
following  i s 
every  gr 
with  x sub st 
; < issible  and 
f wing  ru 


for  i h an  : n rr*  tly  elassi  fied  word  y the 
true : 


i t 1 - w hicl  ■ ntains  / is  jrai  ana  t 1 :ai 
Ituted  for  y but  nol  :onv<  rs<  Ly . Phis  is  f tours< 
corresponds  t<  a grammar  which  contains  the 


s : 


Rule: 


V, 


xV 


vi  - yy3 

V2  - 
V3  - aV^ 

V.  - bVr 


- bvs 


(or  an  automa toi  with  th<  f w lng  segmenl  : 
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Suppose  ‘hat  x Is  the  nr  t ■ yt h nf  • he  word  class  to  which  y 
1 e ved  1 ■ . sine  Ing  L(G)  which 

: on  tains  ' tti  ■ ' • ibstitul  l for  y,  thi 

that  : hey  art  n * equivn ! ent  wi  ] ’ rot  {•••  11  s-‘-  vered  by  the 
: • iur<  ESI  ■ 1 ■ : td  t the  err  one  is 

• nclusl  n f >r  long  times. 

w f or  t hi s i f t hi  partition, 

w < p r : e ed  as  f o ws : L f 1 word  Is  pr  n t ed  f or  : 1 as  s 1 f 1 

• : n and  x Is  a :lass  pr  t type,  thei  rd  listii  :l  f i n • ii 

this  class  (if  any)  is  selected  for  ‘he  gramma* icality  test  in 
th«  surrenl  string.  Ef  this  w rd  is  found  1 be  not  jui valent 

:urr  string,  then  this  is  sufficient  t 
rantei  that  x is  n t enl  t y , that,  y cannot  bel 

this  ass  ai  I s w < abduc • thi  new  hyj  t he s i : 

that  y belongs  to  a higher  index  class.  We  act  on  this 
hypothesis  by  a PROMOTE,  LAST  or  ADD  as  appropriate. 
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If  this  word  y is  found  equivalent  to  x,  then  by  dint  of  the 
gramnaticality  f the  current  sentence  with  the  word  y subst Ltuted 
f<  r x the  strength  'f  belief  in  the  word  y relative  t.  the 
•titi  Is  1 " : ; thal  is,  th<  w rd  y Ls  m ved  t wa rd  tl 

pr  ■ tyj  ■ LI  Ls  idy  idjacent  1 x In  th<  Lint  n 1st . 

r - vt  ry  IG  1 by  I ESTEN,  t 1 ATTENT3  N 

fui  • ■ I : measures  tl  " tret  l f be  lef ” f th<  w r ' ■ 1 >TR ING 
relative  to  the  current  partition  by  the  distance  of  ea-’h  w rd  in 
STRING  from  the  prototype  of  its  current  class.  Then  t • Leet  ! t 
of  a word  in  STRING  to  be  classified  to  the  partition  is  based  on 
the  random  selection  of  the  candidate  words  weighted  by  this 
ii stance.  lecall  that , the  algorithm  is  carried  it , tl 

strength  of  belief  of  a non-prototype  word  which  is  successfully 

l.e.,  li  th<  affirmativ  test<  i juivalent  to  its  believed 
pr  totype  is  increased  by  actually  moving  the  word  (or  its  pointer) 
■ ward  the  pr  t typ<  Ln  tl  out  list  • ; ■ ■ • : i < . Th l s is  a 

concise  representation  of  the  partition  at  any  time  which  has  a 
"built  in"  relative  measure  of  the  strength  of  belief  of  the  words 
to  the  sought -for  partition. 

2.  In  the  case  that  the  word  x selected  by  ATTENTION  is  a 

p:  ■ )type,  the  selection  of  the  word  y within  this  class  (if  this 
class  contains  at  least  two  words)  is  carried  out  in  an  analogous 
way.  That  is,  the  selection  favors  words  whose  strength  of 
belief  is  small . 

3.  In  an  early  expei'-vnt  with  this  algorithm  (and  the  model 
based  on  the  data  described  in  detail  in  the  next  chapter)  the 
ATTENTION  function  selected  the  word  x to  be  classified  by  a 
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sit:.:-.  1*  ran  I me . That  l:-.,  ■ a oh  w r<l  In  ETRINO 

( • jntinr  ar;  .•  w :•  i r ly  • is  a can  i i hat  • for  classification 

• the  par*  iti  n.  This  s ••  •1  • carry  out,  the  partition 
i ■; . - v r.v  at  • r ":*•  * lr • !.■  I nr  r-  ve  i ":;t  rengt.h  measuring" 

y 

s • h*  me  . 

We  n • • • 1 that  t he  weighted  loot  1 r,  scheme  described  above 
is  made  efficient  !y  a deviCf  which  is  only  at proximately 
pro;  rtional  1 strength  f belief . Th  iegree  f tl  approxima- 

■ ' impi  ' number:  f < er  nts. 

D . Th<  - xi  riir  nts  lid  i t lncludt  th  "s  ib-alg  ritbm" 

■ • : , ' rincij  e,  they  1 not  w for  yj  finit< 

. • ■ • • jrai  mar  . : v<  n with  the  REFINE  algorithm  emer  ted, 

which  obviously  converges  after  some  sufficient Ly  r 1 lm<  , 1 he 

convergence  might  be  very  s w.  Tl  is  wi  mad  ir<  • in  th 

ne x 1 hap t er  w 1 1 1 i f e re j t parti  « lar  e x amp  1 e s . W no t < 

here  that  this  slow  convergence  Is  due  to  the  possibility  that  a 
word  selected  and  removed  (F'ROMOTED)  from  a large  class  on  the 

basis  of  REPIN  Emer;'  might  1 11  in  iss ! That  is, 

the  PROMOTE  might  move  the  word  to  the  next  higher  Index  class  on 

ba  I f rej  t i • F • rge,  in iss.  H w<  ver , 

or.  ii  rdinatf  . v long  time,  from  tl  practlca  : int  of  view  might 

pass  before  an  "improved"  classification  may  take  place.  This 
could  be  avoided  in  a manner  similar  to  that  to  be  used  in  state 
discovery  discussed  later.  It.  will  require  some  additional  over- 
head in  terms  of  storage  o Informat  Lon , but  it  will  reduce  the 
discovery  time  significantly.  j,:  future  work,  this  will  be  done; 
that  is,  worn/  PROMOTER  by  the  REFIN1  ment  will  b<  "tagged"  to  5 

I 
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ALGORITHM:  ESTABLISH  RELIEF 


The  descriptions  which  follow  use  APL  notation.  This  notation 
has  proved  useful  to  describe  the  algorithms  and  to  subsequently 
implement  and  test  them  on  a digital  computer.  The  algorithms 
are  also  described  graphically  by  nested  sets  of  contours 
(Johnst  i-  [1971]);  these  are  useful  to  describe  the  dynamic 
processes  defined  for  computer  structures.  Some  of  the  support 
functions  and  the  details  of  the  data  structures  appear  in 
appendix  A,  pp.  121-129. 

The  monadic  function  I’W  MORE  processes  MORE  sentences  in 
an  environment  in  which  i >.'  sentences  have  already  been  processed. 
The  usual  graphical  topology  for  F’W  MORE,  i.e.  a flow  chart,  was 


drawn  in  section  3 above. 


I 

I 

random j s tl  1st  Lnltij  lze  th<  sentei 

• • ■■  rw.  , . 


: ' 'ure 

i W r esses  MORI  sentei  . t 1 r v whict  pr  luces 

th<  string  to  be  processed 


Figure  2.6 

LISTEN  Invokes  REWRITE  which  generates  a sentence  in  the  language; 
this  string  and  the  updated  counter  TW  Is  orlnted. 

V STRINC-IIETFN  -,P  iNB 

( P , OpNP-p  fl-Cfi,  ( VTW-7W+1  ) , • 1,  PRINT  ETR  INC— REWRITE 


V PW  :iORF  ; 7*1  ; NUI1 ; STRING  -,X  iRELIFF  •,  ACT  TON 
71-1 

NEWS:-OUT  IF  MORF<Tl 


STRI NG— LISTEN 
X-STRINGl  NUM-ATTF.NTIOin 
RELIFF-XePROTOS 
ACT TON -0 

Ll:-L2  TP  PELTFr 

_ 

BELI EF-CS7AB  LIEN 

-LI 

L 2 : -NEXT  IF  AC?ZON> 0 

ACTION-REFTNE 

-L2 

NEXT: WNOTF  X 
Tl-Tl+1 
| -NEWS 

| 

0UT:-0 


v j J . H 

IiiU'i'UL- . ( C- l C-N  F+  1 w ) L U>J  ?hw  J ) L 1 + TJ-H  J 


ATTENTION  produces  a pointer  to  the  word  in  the  current  sentence 
which  is  selected  for  classification 


V NUM+A  TTFNTTON  ; INPICES  ;.9 ; P \CP  \L  \PP ; RESULT 
INPICES*-C\ PROTO S 
CP*-+f  INPICFS*  . <P*-Cx  S*-~  \ ^STRING 
PP*-P- INPICES C CP ^ 

NUM-RESULT  [ ?oRESUI,T*-{RESULT*-PP«.  [ / PP[  ?2pAl  )/i  L-pSl 


Figure  2.8 


ESTABLISH  carries  out  classification  as  described  in  the  text 


in  earlier  sections. 


Figure  2.9 


For  a list  of  word  class  prototypes  P and  word  x,  LIST  returns 


V BELIEF*-ESTABLISH‘,R 
RELIEF*-  0 

L1:-*0UT  IF  BELIEF 

R*-X  LI  ST  PROTOE 
L2:-L4  IE  RELIEF*  0 =p  R 

' RELIEF*-S  PEAK  W 1 1/? 

L -12 

L4-.-L1  7F  BELIEF 

BEL  IFF*- API)  ~ 

-*L  4 

OUT:-*  0 


that  sublist  of  P beginning  with  the  class  to  which  x is  believed 


*■0  belong. 

Figure  2.10 


V Z*-X  LIFT  Pi  IX 
IX*-C  \X 

Z*-(  1 ++/IXZC\P)*  P 


SPEAKW:  for  word  RJ  substituted  in  the  sentence:  if  grammatical, 

invoke  ADJUST  and  return  BELIEF  true;  otherwise,  BELIEF  returned 


false 


v RELIFF*-SPEAKW  RJ  •,  RET  URN  " 

BELIEF*-  0 

RETURN* — ACCEPT^  STRi  /-  \NUM- 1 ] , ffj  , STRINGS  NUM+  \ ( p STRING  )-NUM  ] ) 
L \ : -*0UT  IE  RETURN 


UELIEF+ADJUST  RJ 
RETURN «-l 

-»L1 

OUT:-*  0 


ADJUST  'arries  out  nodi. [Meat  ion  to  the  partition  (strengthen, 
frmote,  last)  as  appropriate. 


V at  LI  LF—ADJ'dL  T hJilXiiilitl 
' BELIEF-  0 
I a -C  x X 
ACTIO  11-2 

. Ll:-L3  IF  BE  LI  EFv~RJ  = 1 1R-X  LILT  PHOTOS 

RELIEF- ( C \ RJ  ) ~~  1 TlX 
L2-.-L  1 IF  BFLIFF 

BELIEF-STR EN GTRE N 

-1.2  _J 

. L 3 : —OUT  IF  BFLIFF 

L4  ; —Lf>  IF  RELIEFER J 1 1/? 

/?l-«--l+r\/?r  1+RiRJ] 

PEL  IFF- PROMOTE 

—L  4 

L6;-L3  IF  RELIEF 

BEL IFF- LA  ST 
-L& 


OUT:- 0 


’ i g U r i . . ] 


ier  1 its  prol  type . 


y Z-STRENGTREN ; I 

cin-ci*i-(.~i+ix).ixi 

ACTION-3 
Z-l 


Moves  X to  target  class  RJ  (rightward) 


'ig  ire  2.13 


V Z- PROMOTE 

C-Cl  ( x~l+IX).(IX  F R1  ) , IX ,R1  Ep  Cl 
ACTION *4 

Z-l  


f.gur 


Move  the  eler.uit.  X at  IX  in  the  list  C to  the  end  of  the  list 
(.rightward ) . 


7 Z-LAST 

C-C[(x~'+TX),(IX  F pC),IX^ 
ACTIONS 


■’igure  ?.15 
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f>  f-  elements  at  A and  returns  pointers  to 


• ■ ■ f A . 


v 7.*- A F B I 
Z*A+\B-A ( 


i • h<  • • the  partition  as  a prototv:  e. 


V Z-APPiTX 
PROTOS+-PROTOS , X 
TX*C\ X 
Z*LART 

ACTIONS  


.17 


REFINE  is  invoked  if  the  current  word  is  a class  pro  tot;;:  • ; 
this  algorithm  selects  a word  in  the  class  for  testing. 


V ACTIO U*REFIHE  \EE  \R \RE  LIEF  \FLAG 
ACTIUtJ-1 

EL*\'lt(.-R-(.l  *R+C\PROTOS  ) , 1+ p C )L  PROTOS  i X ] 
Li  ELI  EF *■  0 = p E Z 
L 1 : -*0 U T IF  RELIEF 


X*C  L ( C i X ) + T IE  A L ?2pp«io  J j 

RELIEF* 0 

R*X  LILT  PRO  TOE 

FLAG*ACCEPT  ETRIiJG  L i HUM-  1 J .X.ETRINGl  RUM*  v ( p STRING  ) -HUM  J 
LV  : -*L  3 IF  FLA  G 

a*l  *R  " 

FLAG *1 
*L2 


u3:-*LL  IF  RELIEF*  0 = pA 


RE  LIEF* A 
*L'I 


ADJUST  lfii 


L 4 :->Ll  IF  RELIEF 


RE  LI LF*ADD 



uUT:*0 


Figure  ?.l8 
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re]  rts  n th<  tcti  taken . 


V k/UUTE  a i FLAG  i A 

A*-  ( 7P  ' ' ) . 'STRINGS  ' . ( lliUM ) . ' J ' 

0-4  , ( Pi:  IUT , J TRIUGi  HUM  J ) , ' 'u'VtLt*'  [ACriO/Vj 
L L A G*-X -GTHI It  Giit  J M J 
LI  :-*OUT  IF  FLAG 
O'  ' .PR IUT  a 
Fi>AG*-i 
-LI 

OUT : ' ' 
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Chapter  3 

A Fragment  of  Eng  1 1 sh  l ra mmar 

1.  In  these  sections  w lescribi  a fr<  jmenl  f r slish  grammar 
whi  ih  serves  1 test  the  word  discovery  algorithm.  This  grammar 
consists  of  words  and  rewrite  rules  with  a probability  distribu- 
on  the  rules.  The  grammar  can  be  driven  to  rand  mly  gen< rate 
English-like  sentences  in  this  syntax-controlled  probability 
language.  Some  sample  sentences  are: 

1.  It  is  orange. 

2.  She  lislikes  the  dog  and  a chair  was  not  blue  while 
John  speaks- 

3.  It  is  not  orange- 

A.  It  was ■ en  and  a dog  was  not  seen  by  the  girl. 


I 
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2.  The  52  words 

listed  below 

are  arranged  in 

23  labeled  word 

class  as  shown  In 

table  s . i : 

LABEL 

WORDS 

Probability 

Descriptor 

in 

class 

DET 

a 

3/10 

determiner 

some 

1/2 

the 

1/5 

AJH 

clever 

1/4 

human  adjective 

short 

1/4 

tall 

1/4 

young 

1/4 

AJA 

frisky 

1/2 

animal  adjective 

spotted 

1/2 

AJ1N 

fine 

1/4 

neuter  adjective 

new 

1/4 

group  1 

valuable 

1/2 

AJ2N 

blue 

1/3 

neuter  adjective 

green 

1/3 

group  2 

orange 

1/3 

NH 

boy 

1/4 

human  noun 

girl 

1/4 

man 

1/4 

woman 

1/4 

NA 

cat 

3/10 

animal  noun 

dog 

1/5 

kitten 

3/10 

puppy 

1/5 

NN 

chair 

1/3 

neuter  noun 

desk 

1/3 

table 

1/3 

AUX 

Is 

1/2 

auxiliary  verb 

was 

1/2 

VP 

helped 

1/3 

passable  verb 

hurt 

1/3 

seen 

1/3 

VT 

dislikes 

1/2 

transitive  verb 

likes 

1/2 

Table  3.1 


I 
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LABEL 

WORDS 

Probability 

Descriptor 

in 

c lass 

VI 

sings 

1/2 

intransitive  verb 

speaks 

1/2 

CONJ 

and 

H/5 

conj  unction 

while 

1/5 

PRH 

he 

1/2 

human  pronoun 

she 

1/2 

PRN 

it 

1 

neuter  pronoun 

PNH 

John 

1/2 

human  pronoun 

Mary 

1/2 

PNA 

Rover 

1/2 

animal  pronoun 

Touka 

1/2 

ADV 

immensely 

1/2 

adverb 

violently 

1/2 

VC 

claims 

1/2 

verb  (claim) 

says 

1/2 

REL 

that 

1 

relative 

BY 

by 

1 

1 nstrumental 

NOT 

not 

1 

negation 

DOT 

. 

1 

period 

Table  3-1 

- continued 

phrase 

structure  formulas 

of  the  rewrite 

rules  over  this 

dictionary  are: 
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Formulas 

3 -*•  NP+AUX+VP 1 

NP  -*•  PRN  | DET+  ( AJ2N  ) +NN 

VPl->  ( NOT)  +AJ 2N 


S -*•  NP+AUX+VP  1 
NP  -*•  PNA|DET+(AJA)+NA 
VPl->  (NOT)+VP+BY  + NPl 
NP1  ->  DET+NAUNH 

S NP+AUX+VP  2 1 NP+VP  3 

NP  -*•  PRH  UPNH 

VP  2 ( ADV ) +VT+NP1 1 VI 

NP1  DET+NA  UNH 

VP  3 -*•  ( NOT ) +VP+BY+NP2 

NP2  -+■  DET+  ( AJH ) +NH 

S -*■  NP+VC+THAT+S 
NP  -*■  PRH  UPNH 

S •>  S+CONJ+S 


Comments 

neuter 

adjectival  predicate 
animal 

human 

relatives 

globals 


Table  3.2 


4.  These  rules  are  expressed  in  another  form  below.  Introduce 
syntactic  variables,  denoted  by  1 , 2 , 3 , • • • , NV=l8 . The  generating 
algorithm  begins  by  the  application  of  a rule  which  rewrites 
variable  1.  The  probability  of  each  rule  is  shown  in  the  last 


column . 
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Table  3.3-  Fragment  grammar  rewrite  rules 


# 

Rewrite  Rule 

Probabi 1 i ty 

1 

1 - 

DET 

2 

1/4 

2 

1 -V 

PRHUPNH 

6 

1/4 

3 

1 -* 

PNA 

7 

1/4 

4 

1 ■+ 

PRN 

8 

1/4 

5 

2 -> 

AJH 

3 

1/6 

6 

2 -> 

AJA 

4 

1/6 

7 

2 - 

NH 

6 

1/6 

8 

2 -> 

NA 

7 

1/6 

9 

2 

AJ1N 

5 

1/6 

10 

2 + 

NN 

8 

1/6 

11 

3 - 

NH 

6 

1 

12 

4 - 

NA 

7 

1 

13 

5 - 

NN 

8 

1 

14 

6 - 

VI 

12 

1/5 

15 

6 - 

ADV 

17 

1/5 

16 

6 - 

VT 

9 

1/5 

17 

6 - 

AUX 

13 

1/5 

18 

6 -*■ 

VC 

18 

1/5 

19 

7 - 

AUX 

13 

1 

20 

8 - 

AUX 

10 

1 

21 

9 -* 

DET 

11 

1 

22 

10  - 

AJ2N 

12 

1/2 

23 

10  - 

NOT 

16 

1/2 

24 

11  - 

NHUNA 

12 

1 

25 

12  + 

CONJ 

1 

1/5 

26 

12  - 

DOT 

0 

4/5 

27 

13  -> 

VP 

14 

1/2 

28 

13  - 

NOT 

. 15 

1/2 

29 

14  - 

BY 

9 

1 

30 

15  - 

VP 

14 

1 

31 

16  -*• 

AJ2N 

12 

1 

32 

17  - 

VT 

9 

1 

33 

18  - 

REL 

1 

1 

*3 


5.  Each  word  class  shown  above  has  a probability  distribution 
on  t which  governs  the  selection  of  a word  for  the  application 
of  the  rewrite  rule  as  shown  in  the  word  table  in  section  2. 

The  syntax-controlled  probability  fragment  English  grammar  in 
the  finite  state  form  is  obtained  by  the  corresponding  expansions 
of  the  rules  in  section  4 . When  this  expansion  is  carried  out 
we  get  87  rules.  For  example,  the  rewrite  rules  for  variable  1 
with  the  probabilities  of  application  are 


1 -*■  a 2 
1 ->  some  2 
1 -*•  the  2 
1 -*■  he  6 
1 > she  6 
1 -*■  Mary  6 
1 -*■  John  6 
1 -*•  Rover  7 
1 -»  Touka  7 
1 -*•  it  8 


Probabi lity 
3/40 
1/20 
1/8 
1/8 
1/8 
1/8 
1/8 
1/8 
1/8 
1/4 


6.  The  graph  of  the  corresponding  finite  state  automaton  is 
shown  below.  For  clarity,  the  arcs  between  the  nodes  are  labeled 
with  word  class  names.  The  target  state  for  the  terminating  rules 


Is  denoted  by  F. 
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rhi  syi  tax-c  ntr  lied  pr  1 ibillty  ;u<  I . ( : ) Is  represent*  1 
lata  structures  is  lescrlbed  li  Jrenand  r • ' • ; thi  APL. 
functions  REWRITE  (which  pr>  luces  a sen'  >--u  't  in  thi?  probabilistic 
lanruage  L(0) ) and  ACCEPT  (which  returns  the  ‘ruth  value  of 
i«~I  j)  for  any  u€V*)  are  useful  representati  ns  f I Ii  th<  :asi 
tl  ■ thf  ieterministic  f rn  f tty  gramma r is  given.  This  Latter 
remark  is  of  course  no  restriction  on  the  uro  of  these  procedures 
since  the  construction  of  the  minimal  statu  dot  erminist ie 
■ mi  Let  Ly  sp<  :ifl<  i aut  mat  n (ai  i its  is  m rphic  Lai  'u 

int  ■ ri  art  i s < ff<  stive.  • prepr  :<  ss<  r san  b<  imj  m<  nted  t 

prepari  th  Jat  ■ strupi  ires  RIX  and  LES . Th<  1<  ills  f th< 

• peration  -f  those  functions  is  If* ft  f r another  section. 

Th<  support  fur  sti  1 N1  pr  luces  th  "exti srna  1 " f nr  f 
the  words  in  th»-  ianguag'  1.(0)  which  are  internally  represer  ted 
most  con veri i eiit  . . Lnt  ers . That  is,  thi  variables 
. , , . . . , fv  t 1 rd  r<  JV+  , * , ' • ' * • • 

The  modular  APL  functions  carry  out  the  abduction  process 
which  they  fon  i lescrit  . Phei  fur  sti  ns  san  b<  • islly 
modified  to  create  variations  of  the  algorithm  and  to  assist  in 
• • ■ ie s ign  f x t r i s i r i : ; ■ f t hi s w : v . In  part i c u lar , In  addition 
‘ several  variat.  1 ns.,  tests  of  robustness  of  word  class  discovery, 
■i  test  for  behavior  to  compare  t.o  a mathematical  model,  and  the 
like  were  carried  out  with  this  abduction  machine.  An  implementa- 
tion In  some  more  "production"  oriented  language  is  called  for  in 
the  case  of  application  of  these  techniques. 

An  experiment  has  been  designed  which  enables  a participant 
to  act  as  the  perierator  and  the  acceptor  of  strings  in  some 
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That  ts , ' ■ functi  ns  REW  nd  : 1 requirt 

" • " • • whi nvironment  t tarry  out 

♦ it.  iu  •(  ! ’ i ;r  ci'.  r.  Th»  resul t s >f  these  experiments  will 
b«  rep  rt-  1 in  later  work. 

The  i • * 1 1 V t he  • x{  which  f 11  >ws  1:'  typical 

(modulo  the  random  element)  of  the  results  of  the  ext  < riment.s 

f ■ • . ifitl  this  5 f r th<  at  1 :ti  t machine.  S m< 

lira  I on  :*  for  st.af  i stical  analysis  arc  presented  in  r. 

■ . ■ ■ ; • ai  f s i i wil  b<  tarr  i ed  it  ' • ter 

w rk  when  ’he  -omi  uter  implementations  are  ori«  nted  for  t.he 
pn  hi 'tion  f statist.i  in  a large  scale , cos’  -effe  t 1 v.  manner. 

: rder  withii  th<  lict  i nar>  1 s ’t.ltia  randomized. 

I:  t.he  f • 1 low i nr;  experiment  , t.he  random  number  generator  seel  is 

• • l\  • . (7*  • »i  fu  ■ • ' r ' exert 

probabi  istlc  imulati  ns;  that  is  t say,  it  enal  es 
repi  iucibi  ity  f results.  This  is  essential  1 mak<  th( 

: :<  lur  "portable" . ’h  Is  pa  ran  eter  setting  Is  n * essential 

• the  interpretation  of  the  result:-  obtained.  As  can  be  seen 

from  the  word  list  below  (compare  section  ?)  the  order  if  the 

rds  is  scrambled  by  the  function  BIRTH.  The  fixed  word  class 
prototype  of  the  tabula  rasa  word  class  relative  to  this  scrambled 
order  is  "violently". 

The  function  PW  MORE  is  invoked  with  a value  of  more  viz., 
how  many  MORE  sentences  an  to  > processed.  The  sentence 
generated  by  REWRITE  is  PRTNTed,  followed  by  the  word  selected  by 
ATTENTION  (STR  1NG[ NUM  1 ) . The  special  symbols  indi  -ate  the 
classifies!  Ion  of  the  word  X* OTR r NO T NUM  I . In  the  case  that  X Is 

I 


d 
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. • ■ , ■ ■ ■ word  tested  ii  is  printed  after 

mbol.  Th<  symbols  are  to  be  interpreted  as  f . 

+ ADD  a -lass  to  the  partltioi 
t PROMOTE 

- STRENGTHEN 

" cannot  STRENGTHEN:  w rd  Is  adjacent  to  the  fixed  [rot  type 
w selected  w ri  is  PROTOS;  word  tested  by  REFINE  is  believed 
equivalent . 

- word  moved  to  last  class. 

9.  In  this  experiment  we  note  an  early,  rapid  word  class 
discovery  with  • he  time  between  discovery  of  classes  increasing; 
this  same  "learning  curve"  characteristic  is  observed  with  respect 
to  the  number  of  words  correctly  classified.  In  the  table  below 
we  summarize  the  establishment  of  classes  with  the  word  class 
PROTOtvpes  indicated. 

TIME  (sentence  no.)  CLASS  NO.  PROTOS 


1 

2 

violently,  sings 

3 

3 

seen 

4 

4 

by 

5 

5 

a 

6 

6 

that 

7 

7 

Mary 

10 

8 

frisky 

11 

9 

girl 

12 

10 

table 

14 

11 

was 

17 

12 

it 

20 

13 

not 

21 

14 

short 

23 

15 

new 

25 

16 

likes 

28 

17 

orange 

70 

18 

and 

89 

19 

says 

2 38 

20 

cat 

852 

21 

Rover 
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. MAKa  Lit**  «AK*  + 

_nY 

:i'Ht-  DOi 

„ T<-  WOT 

..  ,,  ;siti  ?"E  ’ 

; *» «,, .«« « «*  • 

.,,t  w"- 

■*“•  ‘ Ji-WBCUJ  f"1-'*1 

r ; rxfj  ^'JE  GlHl  ■ 

. r rj  HOT  OHAUGb  • 

* * 


SWI**11  • , «M  ■ 

war  helped  hx  * TJ/£  CA- 

ti0VE/,rItiJ(:l[l  LOME  t „As  WOT  WEErED 

jTtllNLlOJ  ruVEH  VA> 

Ttt..  .,„AT  JOHN  LAX* 
mlltJ  CLAIM*  A 
iwIWO’LbJ  ^ 

re! t HES  THE  DOG  . 

. X I^-Taa  lih* 
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PN  13 

10.  TUUKA  IS  SEEN  BY  THE  KITTEN  . 

STRINCL  GJ  KITTEN  t 

17.  II  NAB  B EU L • 

J Til  IN  Cl  1 J IT  + 

lb.  jiiE  NAS  NOT  HURT  BY  SOME  NOMAN  . 

ETillNGi  7 J NOMAN  t 

1'J.  TUUKA  IE  BEEN  BY  THE  WOMAN  . 

ETillNGi  b j THE 

2u.  IT  NAE  NOT  BLUE  . 

E Till  NO  L 3 J NOT  + 

21.  SOA/E  EHOHT  BOY  CLAIMS  THAT  MARY  SPEAKS  . 

GTSIWGC 2 j SHOUT  + 

22.  /MET  SPEAKS  . 

STiiiNGl  i j amei  " ./see 

23.  TEE  NEW  CHAIH  IS  HOT  CLUE  WHILE  HE  SPEAKS  . 

ETillNGi  2 J HEW  + 

24.  IT  WAS  HOT  GREEN  . 

STEIHGL  1 J IT  w 

20.  SUE  IMMENSELY  LIKES  THE  CAT  . 

ETillNGi  3 J LIKES  + 

21 . TEE  YOUNG  BOY  SAYS  THAT  TOUKA  NAS  HURT  BY  THE  WOMAN 

STRING Loj  TOUKA  t 

2 7 . SEE  SIEGE  . 

STHlHGl 1J  SHE  t 

2b.  TEE  FINE  TABLE  NAS  NOT  ORANGE  . 

STRINGi G J ORANGE  + 

2b.  IT  IS  GEEEE  . 

sTEIHGC 2 J IS  t 

30.  IT  IS  NUT  ORANGE  . 

STEINGll]  IT  u 


31  . 


1 


PL  1 3 

ill  I MMED  J L L 1 DIjLIKEj  LOME  LOMAD  LHI  uL  uOME  x'ABLL  nAu  on  LLD  ADD  A 
1 V U U G nOMAD  VIOLLDTLl  LIKLG  THE  DuG  LHILE  ROVLt\  1L  **0  * uLED  ill 
SUE  MAD  . 

GTRIDG  L25J  THE 

31.  u Ai\  i LIKLG  THE  CAT  . 

. j 111  I D G\.  3 3 I II L 

33.  MAR. Y I MM  EH  J ELY  DIG  LI  KEG  THE  CAT  . 

G THING L E>  J CAT  t 

34.  HE  LI KEij  A C AG  • 

GxiiliJCL  1J  HE  t 

j3.  THE  a I TIED  IG  GLEN  BX  THE  CAT  . 
u x A I i*  G l 0 J THE 

3 o . iWVEe  LAG  HOT  j EEN  bX  GOME  CAT  . 

GTJ! 1 11  Gill  HOVEti  + 

37.  LOUD  LAu  HOT  HELLED  hi  A MAD  LHILE  THE  FIDE  CHAIR  LAG  BLUE  . 
GTRlDGi 13  J BLUE  L 

3 3.  IT  Ixj  U RADGE  . 

G THING LlJ  IT  w 

33 . /I  LAG  GREED  . 

jrMfllol  1 J IV  to 

40.  THE  BOX  LI  KEG  A CAT  . 

GTRIDGL  1J  V//£ 

41.  IV  IG  DOT  BLUE  . 

GV//I/I/GL  4 J BLUE 

42.  // l GIDGG  . 

JV//I//GI  1 J //£’  «- 

43.  HOVER  LAG  GEED  BX  A CAT  . 

GTRINCV21  LAG  " IG 

44.  THE  KITTED  IG  GEEN  BX  A PUPPX  . 

GTRINGi  1 J THE 


43.  TUUKA  LAG  HELPED  BX  THE  CAT  . 
GTRINGi 3 J HELPED  t 


HE  VIOLENTLY  LIKEN  THE  11AN  . 

GYBING  L2J  VIOLENTLY  t VALUABLE 


HL  L I N GL  . 

GTHINGL  2J  LINGS  t VALUABLE 


THE  VALUABLE  CHAIN  IS  HOY  ONAHOb 
STNING12  J VALUABLE  t 


b2 


JONDElC-HV-,  j 
VIOLENTLY 
PUPPY 
IMMENLELY 
GNEEN 

Yu  U NG 
GUILE 
AND 
DUG 

C LEV  Ed 
CHAIN 
^ A I J 
o»  PU  x P L D 
iiAU 

L)  x kJ  Lj  X /»  L*kJ 

m a j i 

x /i  UXj 

DLL  A 
FINE 
CLAIM L 
VALUABLE 
BUY 
u I N GL 
L PEAKE 
EE  Eli 
Hu  NT 
HELPED 
BY 
A 

the 

E Ul-IE 
CHAT 
MANY 
JOHN 
TUUKA 
HE 
E 11 E 
NU  VEN 
FNIEKY 
Cl  NL 
KITTEN 
JO  MAN 
CAT 
I ABLE 
JAB 
A L 

r m 
1 x 

HOT 
EHU  NT 
HE* 

LI  KEG 
0 NANCE 
BLUE 


Tab  1 < : . art  3 1 1 • i 

vocatulary  at  t = Ar 

(The  dark  rlrcle  at  the 
left  of  a wo”!  a 1 paia  ! r 
the  protot  yta- ) 


H3 

■ this  riment , ti  subi  g ri  t ■ ij  R n\  1 for 

• . first  tim*  it  nl  nc  ; ‘ 1 af f i rtns  th<  b*  ief  that: 

j . • • v ' ' - sentenc  "Mary  si  ■.",*•  w rd  ” hn"  which 

, • . basis  • ■ • • sent  5 was  issed  \ ■ ' ■ ent  t "Mary” 

• sts  equivalent  t t he  protot  yj  "Mary”.  M re  v<  r,  sines  th* 
positioi  f th*  w " i "J  hn"  is  ; ljacent  t "Ma ry " in  th< 
w r Is , • hf  irret  1 sti  ngtl  ft  ief  :anr  t be  strengtl  r J 

further . 

At  th<  '1st  sentence,  th*  word  "va Luable"  is  ej  ■ t ■ I fron  th 
first  class  by  REFINE  and  thi  woi  1 is  moved  to  tar rot  class  tw  ; 
the  move  is  made  on  the  bar. is  of  the  non-grai  mat  ' :a  ity  f 
word  "valuable"  in  place  of  the  word  "vi  lent  ly"  ' th<  surrent 
sentence.  The*  is,  th*  sentenc*  "H«  va  lal  Likes  th*  man. " is 
not  grammatical  in  the  fragment  grammar.  Th.  w rd  "va  sat  " is 

m v<  1 ' target  :lass  1 ; 1 ••  at  ts102 ; it  is  : r :1  y : iss  i witt 

"new"  at:  t=135. 

It  is  possible  for  a word  class  to  be  discovered  for  a word 
whi  ; • es  n * 1;  j ir  in  any  sentence  up  to  some  * im<  • . rly  , 

If  a w rd  : es  n * appear  f r any  t 5 tr  ■ > * , ther  ! * is  ren  v<  1 fr  n 

dictionary.)  This  is  illustrated  by  the  "migrati  n"  f the 
word  "clever"  toward  its  class:  "cl  ver"  Is  ejected  fr  n th*  first 

class  at  tim  1137 » and  subsequent  pron  ti  ns  are  made  at  sentence 

nos.  1515,  1629,  1657  and  1678. 

In  early  experiments  which  did  not  include  the  subalgorithm 
REFINE,  the  word  classes  in  this  fragment  grammar  wer<'  discovered 
anyway.  Iri  particular,  the  classes  NA  and  NH  were  determl ned ; 
this  was  due  to  the  "adjectives",  that  is, 


the  classes  A.JA  and  A. ill. 


T so:  t.h  ! s , t ha  • , having  est  abl  i shed  as  classes  [frisky] 

and  [girl]  (at  t . = 11)  we  have  the  sentence 
The  frisky  ‘at  is  seen  by  a girl, 
g.'lf  ?t  t h-  word  "Ml."  for  classification;  then 
The  frisky  [girl]  is  seen  by  a girl, 
i i m mni  i ' . ■ ■ i l . • • i ■■  , 1 1 ! I-'  1 1 •in:'  i t !>  w • • la.'.... 

t<  that,  witt  it  nd  with  it  adjectives,  th« 

partitioning  process  is  dependent  upon  the  order  in  which  the 
classes  are  determin<  i ; that  is,  if  the  class  [cat  ] is  establish  i 
before  [girl],  then  the  discovery  process  converges  to  t he 
sought- for  partition. 

10.  Ari  experiment  in  "robustness"  was  carried  out  by  the  intro- 
duction of  a non-systemat  : • • rror  in  the  teacher.  That  is,  the 
a ceotor  war,  modified  by  (randomly)  negating  the  resoonse  10%  of 
the  time.  In  this  way,  a grammatical  sentence  is  reported  non- 
gramrnatical  and  a non-grammatical  sentence  is  reported 
grammatical . 

The  subalgorithms  LIST  and  ADJUST  were  also  modified.  The 
list  of  words  is  treated  as  a circular  list;  movement  of  words 
during  classification  occurs  both  rightward  and  leftward.  A 
subalgorithm  DEMOTK  was  introduced  to  move  words  left. 

At  any  stage  of  the  algorithm,  if  the  word  to  be  classified 
cannot  be  classed  to  an  established  class,  then  it  forms  a new 
class.  The  results  of  this  experiment  with  a bid irectional  flow 
of  words  is  that  "spurious"  word  classes  begin  to  appear.  There 
is  no  mechanism  for  the  consolidation  of  word  classes;  every 
word  will  eventually  form  its  own  class. 


■ ' :■  >re  3.3.  Modi  fled  ADJUST 


V Z -DEMOTE 

C-C[ ( \R1 ), TXtR\  FpCl 

ACTION-1 

Z-l 


Pip-urr  3.-4.  DEMOTE 


Experimental  !v  .-.u  ij  .•  21“'l  J Analys  Is 

1.  After  t s tenc  l rated,  w<  1 : vi  t ><  rva tl  ns 

r,  z ■■ , for  . , ■ ' , 1 represent  th< 

number  of  words  'orre  * ly  classified  or  the  total  number  of 
: : sc  vered  w rd  aft<  r the  rth  sentence  has  been  processed. 

These  data  are  the  result  of  a probabilistic  process  so  that  each 
experiment  from  "B1  F I : " w 1 ha\  lifferent  hai  iteristics;  it 
h pe  i that  these  differences  might  be  slight.  It  has  beer; 
suggested  that  s‘atistieal  regression  might  be  applied  to  the 
results  of  sev  ral  experiments.  Another'  procedure  might  be  to 
average  over  the  several  experiments  the  datum  observed  at  each  r 
and  then  to  apply  the  data  analysis  described  below  to  this 
smoothed,  preprocessed  data.  Neither  of  these  has  yet  been  done 
nor  explored  further  at  this  * l:ne. 


2.  The  graphical  representation  of  the  data  in  figure  *.l,  ••••.' 
is  depicted  as  continuous  for  convenience,  visually  suggests 
1 "am  trig  ■in".-'  pr  pert  la  s.  The  data  plotted  on  a semilog  scale 


Figure  'l.  1 . Number  of  words  correctly  classed  v.  number  of  sentences. 


appears  in  figure  1-1.2.  A least  squares  straight  line  fit  to 
(r,  £,n( z -z^)),  where  zQ  denotes  ar,  asymptotic  value,  was  deter- 
mined and  is  superposed  in  figure  tt.2  and  was  • I t<  btait  th< 
"fit"  shown  in  figure  A .2  (Davis  [1963])-  The  true  asymptotic 


value,  which  in  these  experiments  is  of  course  known  a priori , 
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H wever,  this  fll  1 es  n ' hav  the  important  property 

which  underlies  the  theoretical  model:  the  tabula  rasa  hypothesis 
which  makes  natural  a constrained  least  squares  fit  to  force  the 
iurve  thr  igh  the  point  (1,1);  this  point  corresponds  t 
initial  equivalence  class. 

3.  That  is,  letting  y^  = ln(z  -z  ),  we  seek  m and  b t minimi 

t 2 

M = £ ^(y  -mr-b)  , subject  to  the  constraint  y,  = JlnCz^-l)  = m+b. 

This  enables  us  to  determine  that 

m = [y-1  in(z  -l)]*u/(u*u) 

/ 

b = -m  + Hn  ( z -1 ) , 

3 

where  the  t-vectors  u = (r-1),  = (y  ) and  1=  (1,1 1)  have 

been  introduced  for  clarity.  The  line  is  thus  y = mr+b  which 
yields  the  experimental  formula  to  be 

z = z - exp(mr+b)  = z.  -(z  - 1 ) exp ( -m ) exp (mr ) . 

The  data  presented  in  table  4 • 1 has  been  analyzed  according  to 
this  plan;  the  results  of  this  analysis  appear  in  table  4.1. 

4.  The  Euclidean  norm  of  the  residual  vector  can  of  course  be 

reduced  by  the  direct  application  of  the  least  squares  procedure 
to  the  form  A+B  exp(Cr)  with  the  observed  data.  The  constrained 
problem:  find  A,  B and  C to  minimize  M = E ( z^-A-B  exp(Cr))'" 

subject  to  the  tabula  rasa  constraint  z^  = A+B  exp(C)  can  be 
readily  computed  as  follows: 

The  transcendental  equation  in  C 

(z-1)  • (v’-[  (v-V*  )/(v-v)]v)  =0 


where  v = (exp(C)-exp(Cr ) ) and  v'  = ^ Is  solved  for  C and  this 
value  is  used  to  determine  D ■ -v  • ( z-1)  / ( v • v ) and  A = 1-B  exp(C). 
rhis  has  L-een  done  and  the  results  are  also  recorded  in  table  ;l . 1 
and  depicted  graphically  in  figure  ^ • 


Figure  *1 . h : Three  superposed  graph  as  described  in  sections  1 
and  h . 


Swords 
cor re ctly 
cl  ar-oed 
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Ar 


0 
1 
2 
3 

5 

6 

7 

8 
9 

Sect!  1 - : 

11 

12 

13 

16 

17 

18 

Section  4 : 

24 

25 

26 

27 

28 

29 

30 

31 

32 

36 

37 
42 

49 

50 
52 
57 
62 
68 
73 
80 
81 
96 

114 

121 

140 

145 


m = -.013216 

e xp ( - . 0 1 3 ' *.98687 

z = 52.  Cl—. 99382( . 98687 )r] 

A = 42.440 
B = -42.493 
c = -.025089 

Z = 4 2 . 4 4 [ 1 - ] . 00125( . 97522) r] 


1 

1 

1 

2 

1 

3 

2 

4 

1 

5 

1 

6 

1 

7 

1 

8 

1 

9 

1 

10 

1 

11 

1 

12 

3 

13 

1 

14 

1 

15 

4 

16 

2 

17 

1 

18 

1 

19 

1 

20 

1 

21 

1 

22 

1 

23 

1 

24 

1 

25 

4 

26 

1 

27 

5 

28 

7 

29 

1 

30 

2 

31 

5 

32 

5 

33 

6 

34 

5 

35 

7 

36 

1 

37 

15 

38 

18 

39 

7 

40 

19 

4 1 

5 

42 

TABLE  4.1 
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5.  A ■ >njecture  based  on  visual  observal 1 Is  that  the  learning 
characteristics  fall  between  ’he  values  determined  in  section  3 
and  section  4;  the  sought  for  characteristics  would  better  repre- 
sent both  the  early  (rapid)  learning  and  the  asymptotic  qualities 
vis.,  the  time  to  determine  all  the  words  correctly.  A heuristic 
procedure  based  on  the  observed  data  could  be  implemented;  this 
would  especially  be  of  use  if  predictive  qualities  of  the  model 
were  desired.  In  4 he  least  squares  procedure,  a weight  ing 
function  appropriately  chosen  would  improve  the  fit.  We  mention 
here  that  the  true  asymptotic  value  (e.g.  the  number  of  words  to 
be  classified),  which  is  known  a priori,  denoted  here  by  z„  could 

— - el 

be  introduced  as  an  additional  constraint  to  reformulate  the 

least  squares  problem  as:  find  C to  minimize 

t 2 

Z „ , ( z -z  [ 1- ( 1-1/z  )exp(C(r-l ) ) ] ) . This  leads  to  a transcend- 

1?  — -L  r 3.  3 

ental  equation  in  C 

[(z-lz  )/(l-z  ) - exp(Cu)j-u  = 0 

3 3 — — 


where  the  t-vector 


u 


( r-1 ) . 


Chapter  5 
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A Mathematical  Model 

1.  Consider  an  "Incidence  matrix"  of  word  equivalents  whose 
entries  are  up  sated  as  the  partitioning  proc- dure  is  carried 
out.  All  the  entries  are  initially  set  to  some  number  t ,, 

i < P < 1.  At  each  stage  of  the  procedure,  the  entry  • rr<  - 
sponding  to  the  pair  of  words  selected  for  testing  is  either 
augmented  or  set  to  zero  according  as  the  test  result  for  this 
pair  is  "believed  equivalent"  or  "not  equivalent"  respectively; 

ther  entries  are  unchanged.  Thus  the  x,y  entry  in  the  matrix 
after  the  r+lst  stage  is 

( py, (r)  the  pair  x,y  not  tested 

Pxv(r+1)  = -<  0 discovered  that  x t y 

I f(p  (r))  strengthened  belief  that  x : : y 
s x y 

where  f ( • ) denotes  a function  which  augments  entries  in  the 
believed  equivalent  cas  . 

2.  The  rate  of  convergence  of  this  matrix  to  the  true  incidence 
matrix  of  the  infinitary  equivalence  relation  will  be  delayed  by 
a non-zero  probability  of  testing  two  words  for  equivalence  and 
getting  an  "Incorrect"  result.  That  is,  if  two  words  x and  y are 
not  equivalent  it  is  possible  that  for  example,  out  of  100 
sentences  which  involve  the  word  x only  20  of  these  sentences 
would  separate  y from  y;  l.e.  only  20  of  these  sentences  would  lie 
ungrammatical  with  the  word  y substituted  I'm*  the  word  x.  This 
suggests  that  the  ratio  of  the  number  of  sentences  which  ,l"  not 
separate  x from  y to  the  total  number  of  sent > no*  s Involving  x 
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(in  the  -ase  that  x and  y are  not  equivalent)  be  considered  as  a 
:andld  f r th  af  rementi  n<  1 pr  bability ; n t<  thal  this 
. itity , w hi  wi  : lenoted  by  epsilon,  depends  n each  pair 

w rds . If  epsilon  is  zero*  then  grammatical  sentences 
' ving  x separate  x from  y when  they  are  not  r. ! valent  ; ' n 
such  a case,  the  non-equivalent  words  are  separated  as  n as 

thi  y are  presenl  >d  1 set  her  f r testing.  T r epsil  1 is  ne,  thei 
every  grammatical  sentence  involving  x is  also  grammatical  with 
y substituted  for  x and  hence  x. 


3.  The  rate  of  convergence  will  also  depend  upon  the  frequency 
with  which  pairs  of  words  are  brought  forth  for  comparison  with 
respect  t the  equivalence  relation.  For*  fixed  words  x and  y 
which  are  not  equivalent,  it  is  of  interest  to  compute  the  rate 
at  which  the  corresponding  matrix  entry  p converges  to  zero. 
Since  the  underlying  process  is  probabilistic,  we  propose  to 
compute  the  mean  rate  at  which  such  an  entry  converges  to  zero. 
This  will  indicate  how  to  estimate  the  mean  time  to  determine 
the  true  incidence  matrix  and  hence  the  mean  time  to  determine 
all  non-equivalent  words.  Let  c^  denote  the  probability  that 
words  x and  y are  brought  forth  for  comparison.  Then  the  expected 
value  of  the  sum  of  the  possible  entries  px  (r)  for  non-equivalent 
words  is  estimated  by 


E[  z 
x^y 


PXy(rH 


I E[p  ,(r)]  < (no.  of  non-eq.  pairs)* 

x*y  xy 

# sup  (E[p  (r)]>. 

v \r  y 


The  augmentation  function  f(’)  is  a concave  function  which 
increases  an  entry  in  the  matrix  from  the  initial  ( t a b u 1 a rasa ) 
value  pj  to  a value  according  as  the  strength  of  belief  of 
equivalence  of  ‘he  corresponding  pair  of  words  increases.  The 
r- ‘ h iterate  of  this  function  which  enters  the  estimate  of  the 
rate  of  convergence  behaves  asymptotically  like  the  function 
1-ab1  , as  will  be  shown  below;  such  a choice  is  natural  for  the 
function  which  is  to  indicate  increasing  strength  of  belief  in 

T*  ^ 

the  word  class  equivalence  of  words.  Let  f (x)  denote  the  r-th 
iterate  of  the  function  f(x)  which  maps  the  interval  [0,1]  into 
itself;  moreover,  assume  that  f(x)  is  continuous,  that  f'(x)  exists 
in  (0,1  j f ( x ) > x , and  f(!)  = l.  Then  by  the  successivf  application  of 

the  mean  value  theorem, 


^r  + 1 * 


(x) 


f[fr*(x)] 


f[fr*(x)]  - f ( l ) + f(i) 
l - { f ( l ) - f[fr*(x)]} 
l - kr[i  - fr*(x)] 

1 - kr{l-[l-kr_1(l-fr_1*(x))]} 

1 ' krkr-lCl“fr~1<,(x)] 


= 1 - (krkr_x  ....  k, )[l-f (x)  ] 
r 

- 1 - ( n k, ) ( 1-x  ) , 
i=o  1 


where  the  k^  are  constants  (values  of  the  derivative  of  f( • ) at  the 


appropriate  Intermediate  value). 
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' 0,  so  x(  = f*  ( r'-j  ) = 1-f'  - , •••'■  er<  * I ,1) . 

• t<  f'(£  ) by  k , '■  j by  xq , and  f*(l)  by  f wber<  >;•  assumi 
,:<<<!. 

i-x 

Then  k = 5 < 1. 

o 1 — x 

o 

In  the  same  way, 

X?  = f ( x . ) = 1-k , ( 1 — x | ) 


x = f (x  ) = 1-k  (1-x  ) 
r+1  r r r 


i -x 

where  k = = 

r 1-x 


r+1 


< 1. 


Now  let  K = n 
i =o 


Li 


k,-< 

k.  = k1+K-K  = (k  .-<)+  = <[1  + 


r+1. 


r k . - < 

Denote  n (l  + — 1-~  ■)  by  n(r).  Then  K = k1  'H(r)  and 
l = o 


n+l  * 

1_r  <V 

r+1 


(l-xo)n(r) 


Assume  that  f'  satisfies  | f ' ( x )-f ' (y ) | < X | x-y | for  some  constant 
X;  then  |k  -f'(l)|  = | k^— k J < |xr~l|,  for  all  r;  hence  the  series 
Ik,-<  is  dominated  by  fl-x^  and  the  latter  converges  since 
k^  < 1 (i.e.,  the  ratio  test).  Thus  n(r)  converges  to  a value 


/ v , r+1 

(P1)  ~ 1 - n< 


which  we  denote  by  n . 
This  proves  that 
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.’up;  r.<  th'i*  ■■  : i ' ' ■ r attention  on  a fixed  pair  of  nor - 
ti\  ii  m r nd  y.  After  t sentences  in  th«  rtiti  ning 

iu  essed,  suppose  of  thi  iu  1 

this  pall  for  Ls  n:  then  th<  x, y nti  r Lh  'a-  natrix 


V ^ 

f (PX) 


xy 


or 


Yr  # 

th<  <— 1 1 iteral  1 he  igraentation  function. 

This  entry  is  r.a : ■ a • i>;  the  case  that  k trials  took  place  with 

x and  y believed  equivalent;  each  such  trial  has  probal  ilitj  , 

was  inscribed  in  sec*,  i . • . Hence, 

xy 


L * [r 

K [ p ( t ) : f which  k trials  Involve  x,y]  = f ( r • , ) r , 


where  0 < k < t . 


6.  Recall  that  c denotes  the  probability  that  words  x and  y are 

; then  for 

have 


- 1 ’ . f . . . + ■ 

_ vt  ‘ k . . t-k  k„k* , x 

= l ( i 1-c ) e f (p,  ) 

k=o  K 1 

- ZX'  (h  (ce)k(i-c)t_k[l-a  bkJ 
k=o  K 

~ f(  •>)  + (1-c)]’  - a[(ceb)  + (1-c)]' 
*•[(  • ) + (1-c)]1  , since  0 < b < 1; 


th  js  , 
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{ ' - ■ ( 1 -* 


7.  For  a given  pair  of  words  which  is  equivalent,  after  t 
sentences  the  x,y  entry  will  be 

Pxy(t)  = p(t)  = 

for  k sentences  (out  of  the  t)  which  involve  this  pair.  Then  the 
mean  value  of  the  entry  is 

E[p(t)]  = (?)  ck(l-c)t_kfk*(Pl  ) 

k=o  K 1 

~ 1 - a[  (cb)  + (1-c ) ~\f  . 


To  summarize,  ■ ‘'ter  t sentences  have  been  generat'd 

the  corresponding  entry  in  this  " incidence 
matrix"  has  mean  value  for  large  number  of  trials  t given  by 


E[p  ] 
*xy 


1 - a[l-c(l-b)]t  if  x = y 
[l-c(l-e)]  otherwise. 
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9.  Rate  of  Convergence. 

A.  The  syntax-controlled  probability  sentence  gen<  rat  r Is 
a ran  i -miy-t  as-oj  mechanism  which  produces  sentences  S in  L(  I) 


*:T  » 


Thesf  events  art  is  l to  determine  word  classes.  Consider 
non-equivalent  words  x and  y;  the  following  describes  quantities 
wl  Lch  stimat  the  rate  of  convergence  to  the  sought-f  r 
part  it  1 ■ . W ! Lify  th<  a i gorit  hm  as  follows : c<  nsi  1<  r tl 
family  f ■ v< tnts  {E  (x,y  ,a<  V*} , when  thi  E (x ,y  ien  te  tests 
: il  valor. of-  and  each  has  value  0 or  1.  The  set  Y*  oar;  be 
n^rt itioned  Into 


A = (aGV*  E (x,y)  = 1} 

Ay  d 


A = {aGV*:  E ( x ,y  ) = 0} 

Ay  d 


Kate  that  A ^ includes  in  its  definition  strirr  everts  whi  -h  i 
xy 

not  contain  x. 

.'ince  x/y  we  are  hopeful  that  an  E.  (x,y)  might  occur  with 

3 

(2 

aGA^;;  but  our  reality  is  that  it  occurs  with  probability  which 
we  denote  by  tt,  where  tt  depends  on  the  pair  x,y: 


^(E  (x,y):aGA  } = ti 

d X y 


so  that  for  the  desired  events 


^{Ea(x,y):aGAxy}  = 1-w  . 


The  pypnts  occur  (t*l,2,...)  and  we  can  tabulate 


I 
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sentence  number 
1 

2 

3 


probability  of  success 

1-TT 

T!  (1-TT) 


IT1  1 ( 1-TT  ) . 


The  mean  sentence  number  for  success  is  given  by 


1 ■ ( 1-TT  ) + 2 * TT  ( 1-TT  ) + ...  + t • TT1-1  (1-  TT  ) + 


( 1-TT  ) Et  tTT1  L=  ( 1-  TT  ) 


(1-TT  )' 


1-  "TT 


The  probability  tt  consists  of  two  parts:  the  probability  that 

a sentence  does  not  contain  the  word  x and  the  probability  e , 

where  epsilon  was  described  earlier.  In  the  next  section  we 

derive  formulas  for  these  quantities. 

B.  To  compute  tt  proceed  as  follows: 

define  A = {S^V*:S^L,  S does  not  contain  x} 
o 

A^  = { SEV*  :ysC5L,  | S | >_  k and  first  occurrence  of  x is 
at  position  k} , 

where  |s|  denotes  the  length  of  a string  S. 


Then  L = A U U A,  . 

° k=o  k 

For  any  string  S in  Afl,  the  probability  attached  to  tnls  string 


is 


^(S:SGAo)  = Ek  = ] ^(S:S€A(|I  I S I =k  } . 
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Denote  by  P(0  = the  probability  associated  with  the 

continuing  rewrit  e rules  i -*  £ j , and  by  r(r, ) ={r^(£)}  the 
: robability  associated  with  the  terminating  rules  1 £ . 


Let  d = [1,0,0,. ..,0], 


P = IE 


Z 


£€V 


P'n(5) 


and  r = 


Z , n 

L vn 


i ( • j- 


T **'  ’T 

2" 


Then  ^fS:Sf-A,  } = d (r+Pr+P  r+ . . . ) 


= dT ( I-P ) 2r 


as  developed  in  Grenander  [196?]. 

To  estimate  epsll  n.  Lntroduce  a transformation  9.  : V*  -*  V* 

defined  for  string.:  I D | > k as  the  string  J (S)  formed  by 

replacement  of  the  k word  in  S by  the  word  y. 

Define  = iS:^.  e ,'GA^},  k > l 

and  B = A . 

o o 

Then  these  sets  can  be  identified  as  precisely  those  which 
contain  strings  which  cannot  separate  words  x and  y.  The 
computation  of  the  probability  attached  to  a string  in  , which 
we  write  as 


^S:S6Bkl  =^{g(S)  = g(J^S),  S e Ar} 


is  carried  out  with  the  introduction  of  the  following: 
let 


t 1 ( x ) 


( x ) > 0 


Pj  ^ (x  ) > 0 


else 


and  t j j ( x ) 


0 


else 


Then  for  ) = 


r](x'T1(y)  + ZcZi1,j1Pli1  x)ri1(C2)tlj1y)Tj1(^2)  + 


+ E_E 


C'i  j ,i?  P1  ! . ' x)t1.1  (y)pi1i  ,U2)tj1J2(C2 


.1 1 , j 2 


ri.(VTj2(V  + 


+ V(i>  P3  ■ x)t3j1(y)pi1i«(e2)tJ1J,(C2)-*-- 

(j)  1 1 12  12 


iL-21L-l  , 1;  1 JL-2JL-1  •'  VlL  JI-1L 


where  the  multi-index  (i)  = Cl , , 1 , . . . , 1,  _ ■ ) • 


Introduce  the  tensors 


S(x,y ) = [p1  . (x)tiR(y) ] 


c(x,y)  = [r1(x)x1(y)] 


M = E?  P(C)®T(C)  = [E^  PlkU)tuU)] 


and  N = Er  r(c)<X»t  U)  = [E  ( c )t j ( O ] » 


where  $$  Indicates  the  Kronecker  product. 


Then 


^{S:SGB1}  = dT-[c(x,y)+S(x,y)E”=2  ML_2N] 


In  like  manner, 


» 
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y{S:SEB  } = d 1 [ Pc ( x , y ) +PS( x ,y  ) • z ‘ = , Mlj  3H] 

^(S:S6B.  } = dT[Pk'Lc(x,y)  + Pk*1S(x,y)  Z°"  M:,_(k+1)N] 

K L=k+1 

Since  Z?.,..,  m1j_(k  + 1)N  = ( I-M ) ~ 1 M for  k = 1,2,... 

we  get 

^(S6L,  or  SGB.,  or  . . . or  r 6 B,.  r ...;  = 

T oo  Ir  «■] 

= d zk  = n P [c(x,y)+S(x,y)(T-M)  N] 

Using  Z,  Pk  = (I-P)~‘,  we  obtain 

e = dT(I-P)*1Cc(x,y)+S(x,y)(I-M)*1.V]/(l-^A  )). 

^ y 

The  terms  and  factors  can  be  i'nterpr*  4 ■ ■ 1 as  follows: 

the  i-th  of  the  vector  d^(I-P)-'  is  the  probability  of 
getting  to  state  i without  writing  the  word  x; 

the  array  S(x,y)  selects  the  appropriate  interactions,  if 
any,  for  a transition  from  state  i to  k writing  x with 
probability  p^(x)  and  from  state  i to  l writing  y; 

the  k,H  element  of  the  array  (I-M)-1N,  which  is  independent 
of  x and  y,  corresponds  to  the  probability  that  a path  be 
taken  which  exits  state  k and  leads  to  the  final  state 
writing  the  same  substring  as  the  one  written  by  some  path 
from  state  l to  the  final  state. 
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The  factor  (I-M)  ^N,  denoted  by  Q,  can  be  written  as 

Q = MQ+N. 

By  dint  of  its  interpretation  as  a probability,  the  diagonal 
entries 

qii  = - 

thus  Q can  be  computed  from  the  tensors  M and  N by  iteration. 

This  is  important  because  Q is  used  to  compute  the  delay  in  state 
;i sc every . Note  also  that  since  M is  formed  from  the  matrices 
P ( £ ) and  T(c),  it  is  possible  to  carry  out  the  iterations  required 
without  the  exnlicit  formation  of  the  tensor  M;  the  non-7,ero 
‘"■tries  of  this  array  can  be  formed  as  needed  for  computation. 

From  ‘ho  j receding  '-q sat  ions,  we  get 

7T  = zT  *(B.  ) « dT(I-P)_1[  r+c  (x,y)+S(x,y)Q]. 
k=o  k 


C.l.  The  calculation  of  tt  is  carried  out  for  a grammar  which 
is  based  on  the  fragment  grammar  introduced  earlier.  This  new 
grammar,  with  fewer  states  and  fewer  words,  isolates  the  problems 
attendent  to  word  class  discovery  and  further  reduces  the  burden 
of  numerical  calculations  for  examples.  In  addition,  this  new 
grammar  will  help  to  fix  ideas  in  the  chapter. 


rh<  f : wing  "1  y grammar'*  was  extract*  I f rm  th<  fragment 

: rh<  LO  w rd  :lass<  s wh  Lch  includes  th<  :lass 

ntainlng  th<  singlet  i w rd  " . " ) partiti  • the  • w rd  ii  ti  nary . 
■ r<  ar<  13  state  variables:  the  fina3  state  lenoted  by  F md 

states  which  correspond  to  the  variables  of  the  grammar  governed 
:y  39  rewrite  rules  which  are  the  expansions  of  the  phrase 
structure  formulas: 

S+NP+AUX+VPl 

MP-DET+NA | DET+NH 
VPj*  ( NOT ) + VP+BY+NP1 
NP1-DET+NAUNH 
S-S+CONJ+S 


. : ( , 3 


The  rewrite  rules  are  shown  it  lew  with  the  probabilities: 


1 -+ 

DET 

2 

1 

2 - 

NA 

3 

.5 

2 -> 

NH 

4 

.5 

3 - 

AUX 

5 

1 

A -*• 

AUX 

5 

.5 

4 - 

VT 

8 

.5 

5 - 

NOT 

6 

.5 

5 - 

VP 

7 

C 

6 + 

VP 

7 

1 

7 - 

BY 

8 

1 

8 - 

DET 

9 

1 

9 - 

NHUNA 

10 

1 

10  -*• 

CONJ 

1 

. 2 

10  - 

DOT 

P 

.8 

Table  5.2 


These  expand  Into  the  39  rules  a:-.: 


6b 


Cumulative  transition 
probabilities  (for 

Rewrite  Rules  each  variable  rewritten) 


1-11 

2 

0.  3 

1-12 

2 

0.8 

1-13 

2 

1 

2-14 

3 

0.15 

2-15 

3 

0.25 

2-16 

3 

0.4 

2-17 

3 

0.5 

2-18 

4 

0.625 

2-19 

4 

0.75 

2-20 

4 

0.875 

2-21 

4 

1 

3-22 

5 

0.5 

3-23 

5 

1 

4-22 

5 

0.25 

4-23 

5 

0.5 

4-27 

8 

0.75 

4-28 

8 

1 

5-32 

6 

0.5 

5-24 

7 

0.66667 

5-25 

7 

0.83333 

5-26 

7 

1 

6-24 

7 

0.33333 

6-25 

7 

0.66667 

6-26 

7 

1 

7-29 

8 

1 

8-11 

9 

0.  3 

8-12 

9 

0.8 

8-13 

9 

1 

9-18 

10 

0.125 

9-19 

10 

0.25 

9-20 

10 

0.375 

9-21 

10 

0.5 

9-14 

10 

0.65 

9-15 

10 

0.75 

9-16 

10 

0.9 

9-17 

10 

1 

10-33 

F 

0.8 

10-30 

1 

0.96 

10-31 

1 

1 

Table  5-3 
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The  words  in  the  dictionary  are  represented  above  by  the  integers 


, . T1  ey  ar<  si  wi  1 1 . w with  th<  "pr  babi  Ltj 

withii  ' : . w rd  :lass " a s sh  wn . 


{ 

( 

L 

{ 

I 

{ 

( 


A 

11 

0.3 

THE 

12 

0.5 

SOME 

13 

0.2 

CAT 

liJ 

0.3 

DOG 

15 

0. 2 

KITTEN 

16 

0.  3 

PUPPY 

17 

0.2 

BOY 

18 

0.25 

GIRL 

19 

0.25 

MAN 

20 

0.25 

WOMAN 

21 

0.25 

IS 

22 

0.5 

WAS 

23 

0.5 

HELPED 

2b 

0.33333 

HURT 

25 

0.33333 

SEEN 

26 

0.33333 

DISLIKES 

27 

0.5 

LIKES 

28 

0.5 

BY 

29 

1 

AND 

30 

0.8 

WHILE 

31 

0.2 

NOT 

32 

1 

. 

33 

1 

Table  5 . ^ 


The  graph  of  the  corresponding  finite  state  automaton  is: 
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3.  The  matrices  P(0  = {p,j(01  consist  of  2300  entries,  many 
of  which  are  zero.  For  any  word  c , P(c)  OOTCc)  is  extremely 
sparse  so  that  M is  sparse.  The  iterative  form  for  the  calcula- 
tion of  Q is: 

QU'+:  ' = MQ(p)+N,  P=1 , 2 , . . . 

(0) 

= I 

where  I denotes  the  10x10  identity  matrix.  This  simultaneous 
linear  system  of  equations  in  the  90  unknowns  for  , i^j  can 
be  solved  directly,  for  the  example  in  this  section.  We  observe 
’hat  the  Iterations  converge  because  the  eigenvalues  of  M have 
moduli  less  than  one. 
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The  only  non-zero  entries  in  M are  listed  below: 


Indices  i 

J 

k 

l 

m I j k 8. 

1 

2 

1 

2 

1 

1 

2 

8 

9 

1 

2 

3 

2 

3 

1/2 

2 

4 

2 

4 

1/2 

2 

3 

9 

10 

1/2 

2 

4 

9 

10 

1/2 

3 

5 

3 

5 

1 

3 

5 

4 

5 

1 

4 

5 

3 

5 

1/2 

4 

5 

4 

5 

1/2 

4 

8 

4 

8 

1/2 

5 

6 

5 

6 

1/2 

5 

7 

5 

7 

1/2 

5 

7 

6 

7 

1/2 

6 

7 

5 

7 

1 

6 

7 

6 

7 

1 

7 

8 

7 

8 

1 

8 

9 

1 

2 

1 

8 

9 

8 

9 

1 

9 

10 

2 

3 

1/2 

9 

10 

2 

4 

1/2 

9 

10 

9 

10 

1 

10 

1 

10 

1 

Table  5-5 

0.2 

The  only  non-zero  entry  in  N is 


N = E^UH.U)  = r10(33)x10(33)  = 0.8. 

After  simple  arithmetic  manipulation,  the  non-zero,  of f-diaponal 
entries  of  Q are 

q34  = 1 
= 1/2 

q56  * ™ 

q65  = 


1. 
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4 . It:  the  •’  U wing  samplt  • raj  jtatl  ns  for  specified  words  x 
and  y 

let  P1  = ^lg(5)  = g (y.(O),  SEA.,  k > 1}  . 

Suppose  x = {a);  if  y = (a),  then  P,  = 56.5$. 

$*{SEA,  , k > 1>  = -*(A  ) = 1-0.435  = 56.5%. 

K O 

Hence  we  compute  it  and  find 

■^{S:S  does  not  separate  x and  y)  =1. 

If  y = (cat),  then  F'^=0  and 

i^{S:S  does  not  separate  x and  y)  = 0. 


Suppose  x = {cat};  then.^(A()  is  calculated  as 
^{S  6L:S  does  not  contain  feat}}  = 67.6%. 

If  y = {cat},  then  P ^ = 32.4?  and 

^{S:S  does  not  separate  x and  y}  = 1. 

If  y = {boy},  then  = 32.4?  and 


^{S:S  does  not  separate  x 

Also,  ^(^(S)6L:  SEAk>  k > 
e = . 324/(1-. 676)  = 1 . 
Suppose  x = {boy } ; 

^>{S  €L:S  does  not  contain 

If  y = {cat},  then  = 20.3# 

^{3:S  does  not  separate  x 


and  y } = 1 . 

1}  =32.4?  and  hence  for  these  x,y 

{boy}}  = 72.3 % 
and 

and  y } = 9?  .6  % . 


That  is,  for  the  toy  grammar  the  mean  sentence  number  of  success 
is  given  by  1/(1  — 92.3%)  = 13.  To  relate  this  to  the  experiment 
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j ' rf  raw  i , n t<  that  the  averagf  sentence  length  Ls  • . ■ • • 

" : y " ! s • • , ■ : ■ ■ • typ<  ' ’ ■ ass  wl tl  sev  n ther  w rds . 

1 } ft;  y)  S separates}  = r )?{  b ■ -y  | ? > *-^{  -at  1 1 v 

A Tuir  imate  for  this  t robabil  II  y is 

= (0.077)(g^inr)(i) 

= 0.001117^3?. 

This  small  probability  Indicates  that  th»-  waiting  time  !'  r r.tv 
separati on  f 4 b<  ?<■  two  words  is  long. 


71 


Chapter  6 

Pis  ' rithn 

. in  ’ he  i ' ■ i u.'  :ha  f ’ er,  the  w*  rd  i ass*-;-.  wore  determin'-d  by 
"int  ractlve"  rorit  hrr  ; 1 hat  Is,  i rithi  wl  ich  iepei 
:i  a ’ raining  s>  quenoe  establishes  the  part  i t.  ion  of  ’he 
licti  nary  f t *ds . Th<  alg  rit  hin  is  based  n thi  pr  tipi*  : 
an  equivaler.  ■>.-  relat  ion  aeflned  on  a set  induces  a par’  itiuri  int 
juivalenc<  Lasse  s.  th  as<  f states , ti  is  si tua— 

■ ; • ' estat  ished  by  th<  foil  w in*  bs<  rva tion.  ■ finil  tat< 

grammar  G defined  on  the  word  dictionary  induces  a finite  ind*x 

* 

luiva  r<  iti  i ■ . v . lefined  as  f LI  w > : Let  i,v  it 

# # 

; if  g . = r;(vz  for  all  : €E\  , thei  iEQv,  when  - • Ls  th< 

. ■ . fui  :ti  r intr  iuced  earlier . Th  is  equiv 

it 3 ■ J fines  "initia  string"  equivalence  classes  which 

# 

parti  ti  int  th<  states  f th<  ighl  for  au n. 

Th<  synth<  sis  f this  c m{  Letel y sp<  :if i<  1,  leterministi 
finite  stat.-  automaton  is  carried  out  by  a process  whi  -h  is 
modeled  after  the  word  class  discovery  process  as  closely  as  is 
: ssible . Th<  iif ferences  ar<  n t<  i ii  t 1 next  . • :t i n . VI ’ I 

pr  luc<  string  in  L(G) . A transduction  of  this  string 

i'h  word  by  its  word  class  prototype  and  the  resultant  string  of 
pr  •*.<, types  is  processed  from  Left  t righl  . I Ing 

produced  by  a sequence  of  transitions  beginning  with  th< 
listini  ilshed  "start"  stat<  (V  * 0)  to  the  final  state.  The 
tabula  rasa  hypothesis  is  that  this  start  state  is  t he  only  stat  e 
of  the  automaton.  The  algorithm  decomposes  this  state  into  t h- 
stat.es  of  th> • sought-for  automaton.  Th'  empty  string  NULL  is 
the  representor  of  this  statu  ; VPROTOS  is  a list  introduced  to 
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keep  track  of  the  state  representors.  The  algorithm  begins 
with  the  start  state  and  its  successor  word  (the  first  word  of 
the  current  string);  the  initial  string  represented  by  this  pair 
is  classified  to  a state  of  the  partition  under  construction.  The 
"target"  state  is  noted  and  the  state  equivalence  class  of  this 
state  and  its  successor  word  is  determined,  and  so  on  in  this 
fashion.  A graphical  overview  of  the  algorithm  is  given  below 
in  flow  diagram  form: 


Figure  6.1 

2. a.  The  algorithm  is  based  on  the  observation  that: 
f g(uz)  ??  g(vz)  for  any  strings-  u,v  and  some  z,  then  u and  v 
:•«  not  equivalent.  As  in  the  case  with  word  class  partitioning, 

: • gin  with  one  equivalence  class  (V  « — 0)  and  the  algorithm  will 
• this  snate  into  the  states  of  the  automaton  as  required; 
n ■ the  concept  of  a fixed  prototype  for  each  state 
• no«  called  a representor.  The  new  states  are 


introduced  into  the  partition  as  they  are  "discovered"  (i.e., 
established  by  the  algorithm). 

Each  string  is  analyzed  from  left  to  right  with  a depth 
pointer  D which  keeps  track  of  the  depth  of  the  construction 
relative  to  this  current  string.  The  string  is  completely 
analyzed;  that  is,  the  synthesis  is  carried  out  for  the  entire 
length  of  each  string  from  the  start  state  to  the  final  state. 

b.  The  representation  of  a state  V followed  by  a word  S[D] 
is  denoted  by  V F S[D].  Moreover,  this  pair  (V,S[D])  is  decoded 
as  an  integer  to  radix  NC  (the  number  of  discovered  word  classes). 
As  the  construction  of  the  automaton  is  carried  out,  this  list  of 
non-decimal  radix  integers  can  be  recursively  unwound  in  an  obvious 
way  to  relate  a string  and  its  production  sequence  relative  to  the 
current  belief  about  the  partition.  That  is  to  say,  the  structural 
derivation  of  a string  is  developed  by  the  algorithm  directly  from 
the  string.  This  scheme  eliminates  the  need  for  a priori  knowledge 
of  all  possible  combinations  of  words  which  might  serve  as  initial 
strings  to  which  the  partitioning  is  applied;  the  algorithm  builds 
a list  of  initial  string  codes  as  they  occur;  it  is  this  list, 
denoted  by  VC,  to  which  this  abduction  algorithm  for  state 
discovery  applies. 

c.  At  some  stage  in  the  algorithm  (i.e.,  the  t-th  sentence, 
depth  D and  having  Just  established  that  the  initial  string 
S[iD-l]  belongs  to  state  V),  suppose  that  the  initial  string 
S[tD],  which  we  denote  by  V P S[D],  has  occurred  for  the  first 
time.  Then  this  pair  cannot  yet  be  the  representor  of  any  state; 
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i.e.,  it  cannot  be  among  the  elements  of  the  list  of  state 
representors  (VPROTOS).  Therefore,  we  ESTABLISH  BELIEF  about 
this  new  initial  string  as  follows:  either  V F S[D]  is  to  be 

classified  to  an  existing  state  of  the  partition,  or,  if  it  is 
not  equivalent  to  any  of  these  states,  it  becomes  the  representor 
of  a newly  CREATEd  state. 

d.  ESTABLISHing  BELIEF  about  an  initial  string.  Suppose  that 
a new  initial  string  being  classified  is  V F S[D].  Then  to 
ESTABLISH  BELIEF  about  this  initial  string  we  successively  replace 
it  by  the  representors  of  the  existing  states  beginning  with  the 
representor  of  the  start  state,  which  is  NULL,  and  test  the 
resultant  string  (e.g.  NULL,,  D1S)  to  determine  the  relation  of 
this  new  initial  string  to  the  current,  established  partition. 

If  a state  representor  is  rejected,  then  the  algorithm  will  either 
move  the  new  initial  string  to  another  state  of  the  partition  or 
CREATE  a new  state  (if  all  the  existing  states  are  rejected).  In 

either  of  these  two  cases,  the  string  is  moved  to  a higher  index 

* 

state.  Note  that  it  is  possible  that  for  some  string  z£  V^, 
g(uz)  = g(vz),  but  u and  v are  not  equivalent  (and  this  has  yet 
to  be  discovered).  However,  once  an  initial  string  has  been 
moved  out  of  a state  to  a higher  index  state,  it  is  never  moved 
back  to  a lower  index  state. 

e.  To  ESTABLISH  BELIEF  about  V F S[D]  which  has  already 
"occurred"  at  an  earlier  step  (and  which  is  not  itself  a state 
representor),  replace  this  initial  string  by  the  representor  of 
the  state  to  which  it  is  believed  to  belong.  If  this  state  is 
rejected,  the  next.,  higher  index  state  becomes  the  candidate  for 
the  "target"  state  to  classify  this  string. 
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f.  If  the  initial  string  V F S[D]  is  a state  representor, 
this  means  that  at  some  earlier  stage  in  the  process  carried  out 
by  this  algorithm  this  pair  was  established  as  a new  state  of  the 
partition  into  states;  denote  this  state  by  V'.  As  in  the  case 
of  word  class  discovery,  a refinement  of  the  partition  might  be 

-poss-ib-l-ev  -In-particular.,  • some  initial . string.. u (F..V  .(where  u is 
distinct  from  the  representor)  is  selected  to  test  in  the  current 
sentence  in  the  place  of  the  pair  V F S[D];  the  non-grammaticality 
of  the  "test"  initial  string  is  sufficient  to  reject  the  hypothesis 
that  the  initial  string  u belongs  to  the  state  V'  and  u is  moved 
to  a higher  index  state.  This  modifies  the  partition.  As  will  be 
elaborated  in  a later  section,  we  note  that  the  string  u is  "tagged" 
in  the  higher  index  state  since  it  was  moved  without  a sentence  in 
which  it  may  have  occurred.  The  tagging  will  expedite  some  future 
selection  of  this  initial  string  (i.e.  when  it  appears  in  some 
future  sentence)  for  classification  to  a state  of  the  partition. 

g.  The  establishment  of  states,  the  classification  of  initial 
strings  to  states,  and  the  code  scheme  (initial  strings  as  integer 
codes)  determine  the  partition  of  V*  to  within  a renumbering  of  the 
states  of  the  original  automaton.  That  is,  the  states  and  branches 
are  determined.  A simple  TALLY  scheme  is  used  to  tabulate  the 
frequency  defined  probabilities  of  the  transitions  between  states. 

3. a.  The  flow  diagram  below  depicts  the  component  parts  of  the 


process  for  ESTABLISHing  BELIEF.  The  algorithm  is  applied  to 
examples  in  this  section  to  illustrate  these  steps  and  to 
Illustrate  the  algorithm  REFINEV  which  is  analogous  to  the  REFINE 
in  word  class  discovery.  In  addition,  a technique  to  discover 
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whether  or  not  loops  of  a particular  kind  might  occur  is  intro- 
duced; this  speeds  state  discovery  at  small  increase  in  computa- 
tion. 


ALGORITHM:  Establish  belief  about  an  initial  string 

Figure  6.2 

b.  Consider  the  grammar  over  {a,b,c}:  1 al,l  ■*  b2 ,2  •*  b3, 
2 ->■  c,3  ■*  al,3  -*•  c,  and  the  corresponding  automaton  model 


Figure  6.3 

where  the  period,  is  a symbol  added  to  label  the  arc  to  the 

added  final  state. 

For  the  purposes  of  this  example,  the  probabilities  are  not 
needed;  moreover,  the  word  classes  are  represented  by  1 he  letters 
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themselves . 
graphically 
Initially, 


The  (only) 
decompose  o 
automaton, 
sequence  ab 


It  will  be  useful  to  describe  the  automaton 
during  the  various  stages  of  the  construction, 
it  is  as  shown 


Figure  6.4 

initial  state  is  the  start  state;  the  algorithm  must 
r split  this  state  into  the  sought  for  states  of  the 
The  algorithm  will  be  applied  to  the  training 
c , bbc  , abbabc  . 


78 


T«~2 


S«~  ABC 

V«-0 

D*-l 

0 F a £ VPROTOS? 

NO. 


v*-o 

0+—2 

Of  b6  VPROTOS? 

NO. 


v*-i 

D ♦"3 

I F c€  VPROTOS? 

NO. 


V«-2 

FINAL 


RV^NULL 

SPEAKV:  (NULL,bc) £ L? 

YES.  0 F A T ARGETTED  TO  0 
CHECK:  T(0  F a)F  a.bc) € L? 
I .e.  , (aabc)£  L? 

YES. 


RV*-NULl 

SPEAKV:  (NULL,c)£  L? 

NO.  CREATE 


RV*-NULL  B 
SPEAKV:  (NULL) £ L? 

NO. 

SPEAKV:  (a) € L?  NO.  CREATE 


S*-  BBC 
V*-  0 
D+-  I 

OF  i £ VPROTOS? 
YES. 

v<-i 

D«-2 


Figure  6.5 


I F B£  VPROTOS? 

NO.  RV*-NULL  B bc 

SPEAKV:  (NULL,c)£  L? 

NO. 

SPEAKV:  (a  ,c)6L? 

YES.  I F B T ARGETTED  TO  I 

CHECK:  (Tl  F b)Fb,c)€  L? 

I .e.7(bbb,c)£  L? 

NO.  REJECT 

SPEAKV:  (bc,c)6  L? 

NO.  CREATE 
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v*-3 

d<-3 

3 F C £ VPROTOS? 

NO.  RV«-NULL  b bc  bb 
SPEAKV : (NULL)  €.  L? 

NO. 

SPEAKV:  ( b )€  LT 

NO. 

SPEAKV:  (ec  )£  Lt 

YES.  3 F C TARGETTED  TO  2 

CHECK:  "3  F c is  ( (0  F b)F  b)F  c 

2 IS  NOT  IN  THE  HISTORY  OF  3 

YES. 


T<-3 


V<-2 

FINAL 


S^  ABBABC 

v<— 0 


D<—  I 


0 F a € VPROTOS? 


NO.  RV<-  NULL  8 bc  bb 
SPEAKV:  (NULL,BBabc) 6 L? 

YES.  0 F A TARGETTED  TO  0 
CHECK:  (TO  F a)F  a,bbabc)€  L? 
I .E. , ( A ABB ABC ) 6 L? 

YES. 


v<— 0 

D<-2 

OF  a£  VPROTOS? 
YES. 

V<-l 

0^-3 

I F e€  VPROTOS? 
YES. 

V«-3 


3 F a e VPROTOS? 

NO.  RV<-  NULL  B bc  BB 
SPEAKV:  (NULL,bc)C  L? 

YES.  3 £ A TARGETTCO  TO  0 
CHECK:  3 F a IS  ((0  F b)F  b)F  a 

0 IS  IN  THE  HISTORY  OF  3 

( ( ( ( ( (OFb )Fb )Fa )Fb )Fb )Fa, bc  )eL? 
I .E. , ( BBABB ABC ) £ L? 

YES. 


1 
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V«-0 


D^5 

0 F e 6VPROTOST 
YES. 


V<-l 
D *~6 

I F c £ VPROTOS? 

YES. 

REFINEV:(3  F c )€L? 

i.e.,  (bbc^L? 

YES. 

V*-  2 
FINAL 


8l 


c.  The  function  SPEAKV  determines  whether  or  not  a "trial" 

target  state  (a  provisional  target  state)  is  go  be  accepted  or 

rejected.  That  is,  suppose  that  V.  F S[D]  is  being  classified. 

D-l 

Suppose  that  V' is  the  state  being  considered  as  the  target  state 

and  that  this  state  has  representor  V F X.  Then  if  the  sentence 

(V  F X),  D1S  is  not  grammatical,  the  state  V' is  rejected  as  the 

target  state  for  this  initial  string.  Suppose  however  that  it 

is  grammatical.  Then  the  state  V.  is  unwound  to  explicitly 

D-l 

give  the  structural  derivation  of  the  initial  string  being 
classified;  this  is  called  the  history  of  this  initial  string. 

If  the  trial  target  state  is  among  the  states  in  the  history  of 
the  initial  string  being  classified,  then  the  algorithm  can 
quickly  verify  that  the  implied  cycle  of  states  is  indeed  possible. 
This  is  done  by  trying  the  implied  cycle  relative  to  the  current 
initial  string  and  in  the  current  sentence. 

If  the  string  associated  with  the  implied  cycle  cannot  be 
imbedded  in  the  corresponding  test  string  to  produce  a sentence 
grammatical  in  the  language  L(G),  then  this  is  sufficient  to 
reject  the  trial  target  state  as  target  to  which  the  .initial 
string  is  to  be  classified. 

In  terms  of  the  worked  example  above:  for  the  first  sentence 

when  0 F a is  believed  to  belong  to  the  trial  target  state  (at 
D *■  1)  by  dint  of  the  grammatically  of  ((0  F a), be),  we  note  that 
this  implies  that  - at  the  very  least  -((OF  a)F  a, be)  must  also 
be  grammatical.  Otherwise,  0 F a could  not  be  state  equivalent 
to  the  state  0.  At  t *■  2 (D  «-  2 ) 1 F b is  targeted  to  state  1. 

This  Implies  that  (1  F b)F  b Is  a substring  which  can  be  imbedded 
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in  the  initial  string  and  produce  a grammatical  sentence.  Since  it 
does  not,  the  state  1 is  split;  that  is,  despite  the  grammaticality 
of  (b,c),  the  state  1 is  rejected.  In  this  case,  the  established 
states  of  the  current  partition  were  all  rejected  and  so  the  new 
state  (3)  was  introduced  to  the  partition. 

It  is  important  to  note  that  this  technique  deals  with  a 
special  case  of  the  more  general  problem  which  confronts  the  state 
discovery  algorithm  and  for  which  the  algorithm  REFINEV  was 
developed;  that  is  to  say,  REFINEV  is  sufficient  to  carry  out  the 
machine  synthesis.  By  checking  for  these  cycles  during  the 
classification  of  an  initial  string  the  convergence  of  the  process 
is  improved.  In  particular,  states  are  discovered  earlier;  this 
is  important  in  the  early  stages  of  the  algorithm  because  of  the 
frequency  tabulations  which  are  later  used  to  estimate  the  transi- 
tion probabilities. 

d.  The  second  example  is  based  on  the  output  of  an  inference 

machine  described  in  Biermann  and  Feldman  [1972];  the  grammar  is 

1 -*■  a2  2 -+  al  3 -*■  a4  4 -*■  a4 

1 -►  a 2 -*  b 4 3 -*■  a 4 -*•  a 

1 b3  2 ->  b 3 -*■  b3  4 + bl 

The  transition  diagram  of  the  corresponding  non-deterministic 
automaton  is  shown  in  Figure  6.8. 
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Figure  6.8 

The  deterministic  syntax-controlled  transition  diagram  is  shown 
in  Figure  6.9' 


the  arcs  are  labeled  with  the  transition  probabilities.  This  is 
the  sought-for  automaton. 

The  notations  on  the  printed  output  which  follows  are  ex;flained 
in  section  7.  For  clarity,  stages  of  the  automaton  under 
construction  are  shown  in  Figure  6.10  (at  t=l)  and  6.11  (at  t=2). 

The  details  of  the  computer  implementation  follows  in  section  6. 

Tn  Figure  6.12,  the  constructed  automaton  is  shown;  the  relative 
frequencies  of  the  internal  transitions  are  also  given  in  parcrtheses, 
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Fipure  6.11 
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11 

18 

I 

3 

B 

11 

19 

2 

A 

1 0 

13 

0 

B 

8 

19 

r 

1 

A 

1 

36 

I 

2 

B 

5 

17  w 

32 

7 

• 

r 24 . 

ZJ  A 

| 

TARGET 

5CZ7] 

CODE  FREQUENCY 

0 

3 

B 

2 

18  w 

20 

11 

r 

2 

A 

10 

14 

25. 

BRA 

A A 

• 

| 

TARCET 

5tD] 

CODE  FREQUENCY 

1 

0 

3 

B 

2 

19  u 

21 

11 

• 

3 

B 

11 

22 

[ 

2 

A 

10 

15 

V 

2 

A 

7 

33 

1 

2 

A 

7 

34 

— 

— 
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4.  The  constructions  carried  out  above  use  the  representation 
of  the  elements  being  classified  (the  Initial  strings)  in  the 
form  V F S[D];  that  is,  state  V which  is  followed  by  word  S[D]. 

As  noted  earlier,  the  states  and  words  are  conveniently  denoted 
by  integers.  The  structural  derivation  of  a string  (the  sequence 
of  states  beginning  with  the  start  state  and  ending  with  the  final 
state  which  corresponds  to  the  string)  is  not  normally  written 
with  the  string.  For  example,  for  the  string  S of  length  4,  we 
can  show  its  structural  derivation: 

S [1]  S [2]  S [3]  S [4]  • 

°’V0  Vi  V2  V3  v»  F 

Indeed,  the  problem  developed  and  solved  in  this  chapter  is  to 

find  the  structural  derivations  for  strings  given  a sample  of 

strings  in  the  language;  the  algorithm  finds  the  "intermediate" 

states  V , , V for  any  string  of  length  L.  The  explicit 

form  of  the  next  state  function  can  be  decoded  as  a decimal  integer 

which  is  uniquely  associated  to  the  positional  numbers  V (the 

state  index)  and  X (the  word  index)  to  some  radix  KEY  (an  integer 

greater  than  one}  as  "X+V*  KEY".  The  target  stat<  * • which  a code 

is  classified  and  the  code  determine  the  next  state  function  for 

state  V and  word  X. 

In  terms  of  the  worked  example  of  section  3b  above,  let  the 
words  a,b,  and  c correspond  to  the  Integers  1,2,  and  3;  let 
KEY  ♦-  4 (chosen  here  as  the  number  of  words  and  "dot"  = {•}). 

Tnen 

0 F a - 1 


0 F b -*■  2 
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1 F c — >7 
1 F b-6 
3 F c — *15 
3 F a — ♦ 13 


That  Is,  the  list  of  codes  which  correspond  to  the  initial 
strings  of  the  language  is 

1 2 6 7 15  13 


The  ■ les  which  correspond  to  the  state  prototypes  are  0,  2,  6 
and  7.  For  convenience,  we  included  the  string  MULL  (an  element 
of  V*)  which  we  decode  as  0.  In  the  example  above,  the  sought 
for  automaton  was  determined  to  within  a renumbering  of  the  states 
(of  the  "original"  generator/acceptor  automaton).  The  total 
number  of  distinct  Initial  string  codes  which  could  be  formed  is 
sixteen;  these  are  0,  1,  ...»  14,  and  15-  After  the  construction 
has  been  carried  out,  a subset  of  this  list  of  integers  is 
partitioned  into  the  state  equivalence  classes  as  follows:  state 

zero  contains  codes  1 and  13  (for  convenience,  the  code  0 is  not 
included  in  this  list);  state  one  contains  code  2;  state  two 
contains  codes  7 and  15;  and  state  three  contains  code  6.  The 
remaining  "possible"  codes  (over  the  alphabet  of  words  a,b,  and  c 
and  the  four  states),  viz.,  3>  5,  9,  10,  11,  and  14,  are  the 
"discards";  these  codes  correspond  to  elements  of  V*  which  are 
not  initial  strings  of  any  elements  of  the  language  L.  Note  that 
codes  4,  8,  and  12  are  not  considered  as  admissible  due  to  the 
special  significance  of  the  symbol  Neither  the  discards  nor 

the  inadmissible  codes  enter  the  partitioning  algorithm  explicitly. 
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It  is  necessary  to  select  the  KEY  large  enough  to  ensure 
that  all  the  possible  distinct  initial  string  codes  are  decoded 
as  distinct  integer  codes.  In  the  case  of  a dictionary  of  NW 
words,  choose  KEY  as  NW.  In  the  case  that  a language  has  word 
classes,  a convenient  KEY  would  be  the  number  of  these  classes 
(NC)  after  all  the  classes  have  been  found.  Here  and  elsewhere, 
we  shall  assume  that  the  language  has  NC  classes  (where  it  is 
possible  that  NC  attains  the  value  NW).  Then  the  number  of 
distinct  possible  codes  (i.e.,  the  number  of  distinct  Initial 
string  codes)  is  (NV)-(NC). 

Some  of  the  advantages  of  this  radix  representation  (over, 
say,  the  obvious  tabular  form)  is  that  (1)  Only  those  decoded 
values  which  correspond  to  initial  strings  of  the  language  need 
to  get  formed;  (2)  This  list  structure  of  integers  is  partitioned 
into  the  state  equivalence  classes  by  an  algorithm  in  a manner 
analogous  to  the  one  developed  earlier:  by  decomposition  or 

splitting  of  classes;  (3)  The  representation  of  an  initial  string 
in  "string  form"  can  be  easily  recovered  by  the  inverse  of  the 
decoding  function  F (described  below);  and  (4)  The  representation 
of  the  initial  strings  carry  their  structural  derivations  within 
themselves . 

The  inverse  of  F is  defined  as  follows:  for  any  code  K,  and 
for  any  integer  KEY,  K = (V)-KEY+X.  The  representor  of  state  V, 
denoted  by  K',  is  encoded  in  the  same  way:  K'  = (V')-KEY+X'. 

Continue  in  this  way  until  the  quotient  (in  this  division 
algorithm)  is  zero.  The  sequence  of  quotients  0,...,V',  V is  the 
structural  derivation  of  the  initial  string  whose  code  is  K.  The 
algorithm  which  carries  out  these  steps  is  called  ENCODE. 
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5. a.  The  syntax-controlled  probability  sentence  generator 
produces  sentences  , S., , . . . , , . . . at  time  intervals  t=l, <',... 
The  underlying  automaton  M has  internal  states  l,2,...,nv,  the 
final  state  F and  a next  state  function  f:V,,<VT  -+  Denote  by 

A 

f the  extension  of  f defined  by 

f ( V, MULL ) = V 
f ( 1 , S ) = F for  all  S EL; 

and  for  VEVN,  xEVT,  and  for  all  uEV* 
f (V ,ux  ) = f ( V , u ) , x ) . 

With  the  addition  of  the  state  T which  corresponds  to  the 
complement  of  V,,  UF  in  V#  the  automaton  M is  said  to  be 
completely  specified.  In  the  following  we  write  M for  M and  f 

A 

for  f and  we  assume  that  M is  deterministic  with  the  minimal 
number  of  states. 

b.  At  t=0,  there  is  only  one  state  V (represented  by  NULL); 
process  each  sentence  by  the  algorithm  described  earlier. 

That  is,  for  each  sentence  S(t)  at  time  t of  length  L(t),  and 
for  depth  d=l , 2, . . . | S ( t ) | rewrite 

S(t)  = ud(t),zd(t) 

= (vrad(t)),zd(t). 

If  for  all  times  t»  zd(t)EL,  that  is 

VFX  ,(t)  = 0 
— d 

and  convergence  obtained.  Otherwise,  at  some  finite  t,  for  some 
depth  d 
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0 = V FX  , ( t ) t 0 
o-  d 

i.e.  (NULL) , z ( t ) £ L and  we  denote  by  V,  the  state  discovered 
d t 

which  is  defined  by  VQFX,(t)  and  has  as  its  representor  the 
string  denoted  [V^  ] which  is  NULL,X^(t)  = X^ ( t ) . This  new  state 
corresponds  to  one  of  the  internal  states  of  M. 

c.  At  any  time  t suppose  m=m(t)  states  have  been  discovered 
and  for  some  depth  d=l  or  2 or,...,  or  |S(t)| 

S(t)  = VFX,z;  then  either 

(1)  VFX  is  classified  to  an  existing  state  among  the  discovered 

states  V ,V,,...,V  , or 

o’  1’  ’ m-1 

(2)  VFX  discovers  a state  V . 

— m 

In  order  to  create  this  state,  we  require  that  g(u,z)  ¥■  1 for 
all  uG  [Vq,V1,  . . . ,V  , ],  where  [V(  ,V^,...,Vm  denotes  the  set 
of  representors  of  these  states.  We  note  that 
the  set  V,Vj,...,Vm  is  a subset  of  1,2,..., n. 

Clearly,  m(t)  _<  nv  for  all  times.  Otherwise,  for  some 
S = VFX.zGL  we  would  have 

g ( VFX , z ) / g( [k ] , z ) for  k=l,2,...,nv 

i.e.  VFX  not  equivalent  to  any  of  the  subsets  of  the  partition 
of  V^.  This  is  impossible.  Note  that  this  ensures  that  the 
constructed  automaton  will  have  at  most  the  minimal  number  of 
states  required. 
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d.  For  any  initial  string  u = V F X which  the  algorithm 

: r •••.uses  by  classification  to  a state,  th<  state  ind<  x is  • >n- 
:•  sre&sing.  That  is,  if  f r s m<  time  t,,  g(u,z^)  f g(NU i. 

S = u,z,  then  u€  {v:v  NULL)  and  it  must  belong  to  one  ■ f 
the  other  subsets.  If  g(u,z^)  t g(  [V^  ] , ) , then  u belongs  to 
of  the  sets  represented  by  V,, , . . . ,7,„  ( , y Supf  se 

g(u,z1)  = g([Vk],z1  ) . 

Now  suppose  that  for  t > t,,  S = u,z 
g(u,z)  / g([Vk],z); 

then  u belongs  to  one  of  the  sets  represented  by 

' k + 1 ’ J k+2 ’ • • ' ’ ’ m(t )-l ' 

e.  The  set  {V(j  ,7^  , . . . ,Vtr  ^ 1 ] is  a permutation  of  the 

Indices  {l,2,...,n  } for  all  t >t*  where  t is  finite.  Otherwise, 
m(t)  < ny  for  all  time.  This  can  only  happen  if,  for  example, 

V F_  X,z  is  in  L for  all  strings  z 
for  which 

[V^],z  is  in  L but  not  conversely. 

This  results  in  the  incorrect  classification  of  V F X to  V. . 

1 

The  event  5(t)  = [Vj],z  for  which  (V  F X,z)  is  not  in  L must 
occur.  Moreover,  the  subalgorithm  REFINEV  requires  that  selection 
from  the  constructed  set  (v:v  = [V^]}  be  made  randomly;  hence  for 
finite  time  t the  test  event  g(V  F X,z)  occurs  with  probability 


one . 
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This  test  event  will  cause  V F X to  be  classified  to  a higher 
index  Hei  :e  f r i finit<  time , V F X will  be 

discovered  with  probability  one. 

f.  The  construction  produces  a deterministic  automaton. 

For  otherwise,  we  would  have 

V F X GI  = {v:v  - [Vi]}  and 

V F xG  J = (v:v  = "V.]}  for  some  V,  some  X and  V.  / V . . 

J V 

Since  V ^ i-  V,  , there  exists  at  least  one  zGV*  for  which 
g([I],z)  * g ( [ J ] , z ) . 

But  if  V F XGI,  then  g(V  F X,z)  = g([I],z).  Also,  if 

V F XEJ,  then  g(V  F X,:-)  = g ( [ J ] , z ) . This  is  a contradiction. 


f 

I 

f 

f 

f 
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6.  The  APL  description  of  the  algorithm  described  in  the  precea 
■ s<  ■'  J ns  is  glv<  n in  this  section;  i4  follows  a format 
analogous  to  the  one  used  in  chapter  two. 

A estal  Lishes  PROTOS  - the  word  :lass  p<  inters  - t th< 

• rans  iucti  n,  Initializes  VI R . S,  V KEY,  TALLY  an i TS,  the 
sentenci  :ounter . Th<  rand  n number  ;enera i r :■> -"d  is  a ] a 


initializi  i. 


V TABULA  -,I  HDI CES 
C-NV+  \HW 

IHDICES-(V1C-Q,~1*V1C)/ ip VIC 
PHOTOS- C L IN DICED  3 
V PHOTOS- VC- , 0 

ket-(  i + iooxA  C)  ,i:c- r me 

UHL-1  * 5 
TS-  0 

TALLY-, 0 


Figure  6.13 


PS  processes  MORE  sentences.  It  Invokes  REWRITE  to  produce  the 


sentence  to  be  processed.  

V PC  MORE \V\S\D i BELIEF ; T1 ; X ; A CTIOH 
Tl  — 1 

HE  US  -.-OUT  IF  MORE  < Tl 

J^kl)UCTJUN  tiEn’HI ?E 
V—  0 
D- 1 

LI  -.-THRU  IF  DUOS 
ACTIO U-C 

BELIEF— ( K— V £ SlD  ])  tVPROTOS 
L2 : — L3  IF  BELIEF 

■ACnjdiT^i 

B ELIEF- ES  TAB  LI SliV 
-L2 

L 3 : — L4  IF  ACTION 
ACT ION- REF  I NEV 
-L  3 


L4  : V-HOTE  ri 
D-D*  1 
-LI 

THRU ‘.FINAL  V 
T\-T\*\ 

-HENS 


Figure  6 . 1 h 


OUT:- 0 
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XDUCTION  carries  out  the  transduction:  words  are  replaced  by 

class  prototypes.  The  counter-  TS  is  updated  and  It  is  printed 
with  the  original  string  and  the  table  headings  for  output  which 


follows . 


V T-*- XDUCTION  S ; NB  ; CD  ; B 

(B  .OpNB*-eB-Qti,  (WTS-TS+1  ).  ' ),  PRINTS 

T-*-PROTOBlCD*-+S(C\ PROTON  ) » . sCii’J 

(Gp*  ').  ' TARGET • ,(4p«  •).  *SlD}'  .(6p*  ').  ' CODE  FREQUENCY' 

( 3p  1 ' ) , 6 0 TO 


Figure  6.15 


F maps  the  pair  V,  W into  the  integer  code  according  as  the  value 


of  KEY. 


7 Z + V F Q 
Z-KEYi/.PROTOSxW 


Figure  6.16 


ESTABLISHV  carries  out  the  classification  of  initial  strings  as 
described  in  the  text. 


9 

!> 


V BELIEF+ESTABLIBHV-.RV 
BELIEF *-0 

LI  :-OUT  IF  BELIEF 

RV+-K  IN  LILT  VPROTOS 
L2 :+L 3 IF  BELIEFvQ -qR V 

BE LI EF-JPEAK V 1 Ui  V 
R V+l+RV 

±L2 

L 3 :+Ll  IF  BELIEF 
BELIEF-*- CREATE 
+L3  

OUTi+Q 


Figure  6.17 
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j 3ELI  ZF-3PEAXV  R <7  ; RETURN-,  rAR  77.  T HI  STORY  TAR7ET  J 
RELIEF*- 3 

RETJRH*—A0CEPT(.SNC03Z  VCl.RJ J ) ,ZHJ 

A-.+OJT  IF  RETURN  


P4  .7  7F  r-  1 ♦■  ( X,?  I XP.7P  r J J ) i .7-7 
R I3T0R  Y*-~  l + PATH  K 
3ELI3F+TAR 7 STe.  HISTORY 

L2-.+L  1 IP  RE T'JR  V 


3: -L4  JF~ DEL  IE F 
UUENCODE  K 

FEY  URN* — ACCEPT  U , ( Cl  + HISTORY  \TARGET  ) ♦ £/)  ,D  + 5 
RELIEF*-  0 


L4  : —1/2  IF  RET  JR  V 


Oi/r  :-0 


V+KB1TV 

Z*-,PROTOSLjYZ  2]] 
FZr/4  /—VC  1 J 3 0 
&2  :-*■£!  fF  FL4  7 


E*  r /P.?7r<77[  1 +VL  1 J J 
J-(P.?JrjJ[V[2j*\V[  2]-3j),Z 
FLA7*-Wl  1 J =0 


pure  6 


02 


AS  TV  is  invoked  to  move  code  K at  IK  to  the  end  of  the  list  VC, 


7 Z-LA3TV  TK ; T 

I “1 +IX) , ( IK  70  VC) ,IK 

vc+vctn 

tally-ia  4 J U nrj 


Figur< 


F:  As  in  word  :lass  partiti  ninp,  F :uts  a Lisl  f elements 

at  A and  returns  pointers  to  elements  at  t h>-  r! rht  of  A. 


i' 


7 Z-A  F .1 
Z+A+i3-A 


Figure  ( 


CREATE  adds  the  code  K to  the  initial  string  representors  ft 
the  newly  established  state  of  the  partition. 


V Z+-CHEATE  ilK 
IK+-VC \K 
TALLYLIK1+Q 
VPPOTOS*-  VPPOTOS , K 
Z*-  LAS  TV  VC  \K 


Figure  6.27 


REFINEV  tests  Initial  strings  classified  to  a state  to  ensure 
that  tb<  r<  Lati  >n  under  e n.st ruction  be  symmetric . 


V ACTIO  N*-REPINEV ;SZ  -,VIK  tBViHJiJ  -.IK-.FLAG 
IK*-  VC \K 

-I-(.1*I*-VC\VPH0T0S  ) . 1 + pVOl  VPROTOS 
ACTION-Q  -pS'Z 
L1:-*L5  IF  ACTION 

I L2  i-*L  1 If  ACTION 


VIK-IK+GZ 

FLAG*-0=pI*-(0=TALLYL  VIKj)/VIK 
L3:-*L4  If  FLAG 

Avgc[TC?pi]  J ‘ 

ACTION *-l 
FLA  G*- 1 


| e4  :-*L2  IF  ACTION 

I*-,l  + ([ /TALL  I [ VIKlT-  TALLX  [ VI  K'. 
K+VCl  VI Kl  r /?2ppl  ] J 
I ACTION *1 


: /l  CTION-K  = VCIIK} 

Lb  : MJUT  IF  ACTION 

A CHON*- A CCEPT  ( ftf  <70  K)%D*S 

Ll:-*Lb  IF  ACTION 

NV-1  +K  IN  LI  ST  VPNOTOS 
Lb:-*L9  IF  A_CTION*0=pHV 
NT*-l  tNV 
ACTION*-MO  VEV 
I TALL  'iV  VC\K  J«-~l 


L9:-*L7  If  ACTION 
f A CTIuN*  CTKATT 

- ^ 

OUT:-*  0 
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Note  updates  TARGET  and  produces  output  to  report  on  the  actions 
taken . 


V TAHGET*NOTE  K1  -,K  -,IK  -.A  ; JKl  \FLAG  -,AOK 
K-V  £ GlUl 
IK*-  VC \K 

TAHGET*~1++ /IKiVCxV PHOTOS 

TALLY [ IK }*TALLYIIK1+1 

A* ( l + “ 1 tpVOHDS ) SPRINT ,S[D] 

HM3p'  ’).(  6 0 1TAHGET)  , ( 6p  ' ' ) . >4  . ' ',(  4 0 IK) , a 0 1TALLYIIK] 

FLAG*K =K1 
LI  :-*OUT  IF  FLAG 


OUT:  ' ' 


A*  ' u’ 

IKl«-FClXl 

AOK*TAHCET=~l  + + /IKli:VC\  VPK07US 
L2:*L 4 IF  AUK 


AOK-TALLYLIK1 } =0 
[L3  :-*L2  If  ylOf 


/l-  • t' 

i4dK«-l 

-L3 


L4  tTALLYlIKl 1*TALLYLIK1 Jtl 
CM  ap  ' ')./!.(  4 0 ?2ViLLniKlJ  ,K1  ) 

FL4C-1 
•*L1 


Figure  6.29 
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The  state  discovery  algorithm  was  applied  to  the  fragment 
grammar  and  the  results  appear  below.  For  each  sentence 
processed,  the  output  consists  of  the  chart  shown;  it  includes 
• he  target  state  followed  by  a word  class  prototype  (transduction) 
which  decodes  to  the  code  indicated  and  has  the  frequency  in  the 
target  state  as  given  in  the  last  column.  For  illustrations; 

target  S[D]  CODE  FREQUENCY 

U 

V X K v 

is  to  be  interpreted:  K * — U F X is  elassifed  to  state  V;  its 

frequency  in  this  state  is  v . Note  that  for  any  sentence,  its 
transduced  string  can  be  read  down  the  chart.  The  additional 
information  to  the  right  in  the  chart,  when  it  appears,  is 
information  produced  when  REFINEV  is  invoked;  this  consists 

of  a symbol  (as  in  word  partitioning),  frequency  and  string  code. 
We  note  that,  all  the  states  are  discovered  at  sentence  19. 


X AU  O uA 
PJ  4 


107 


1.  Til E Uu  0 ID  110 2 DEEI1  UI  HIE  KITTEN  AMU  HA/! I MAD  MOT  DEEM  o I Til 


E K 1 T T L 11 

• 

I Ail  GET 

L o j 

CODE 

EHuOU  EN  C I 

0 

X 

A 

1 

1 

• 'j 

c Ax 

30 

X 

3 

T 

X O 

33 

1 

4 

MoT 

31 

1 

r 

j 

DEEM 

102 

2 

0 

UI 

1 3 C 

i 

7 

A 

133 

X 

b 

CAT 

10b 

A 

X 

0 

AMU 

137 

1 

o 

X 

il/ihl 

10 

1 

3 

ID 

35 

2 

4 

MOT 

31 

x 

3 

DEEM 

102 

o 

X 

0 

UI 

130 

2 

7 

A 

133 

o 

X 

b 

Eh  T 

lob 

2 

i * mi  \ I u I K Ex> 

TilE  KITTEN 

I AH  OUT 

JiUi 

CODE 

VtiEQUEUCI 

0 

3 

!1A  til 

10 
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210 

1 

7 
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U 
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3 
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10 

IT 

15 
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11 

ID 

2 33 

1 

12 

NUT 
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1 

b 
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1 
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TAD  GET 
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CODE 
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2 

i'(7 0 A /i 

17 

4 

3 

1 u 

3 3 

3 

4 

i#C/  i 

91 

3 

3 

oLEti 

102 

3 

3 

B'i 

130 

3 

(Jj 

2 310 

7 

n 

1 3 j 

3 

0 

t-JA  L* 

1 o 7 

2 

0 

An  D 

19  7 

2 

2 

xU  uK  A 

17 

2 

3 

X 

3 3 

4 

4 

Uu  X 

91 

4 

3 

kJ  Lt  La  id 

1G2 

4 

0 

BX 

130 

4 

u 

2 21b 

7 

/l 

13  9 

0 

b 

I’l  A U 

10  7 

3 

i U l DI^LIk 

LJ  tul  do 

g Anii  n 

CU  A I ii  U A 0 U 0 x BLUE  n 

hi  L E J 0 11 H B PE  A 

A 0 • 

* >i/i  GET 

GiU] 

CODE 

FREQUENCY 

0 

j 

HE 

14 

4 

1 

G 

ul.  A Lu 

21b 

3 

7 

A 

13  J 

7 

b 

CA  a ' 

lob 

4 

(j 

4 16  7 

G 

AND 

197 

3 

A 

1 

2 

10 

table 

31 

1 

11 

I Li 

2 39 

2 

4 r . 

X d. 

HOT 

273 

2 

b 

B LB  E 

281 

2 

0 

Alii) 

197 

4 

3 

olAHX 

1 0 

3 

u 

2 14 

b 

PE  iiKi> 

219 

1 

(£  CLAIolB 

Til a'T  Toil li 

CLAI1IB 

iiLc/£  . 

T Ail  GET 

JLP  J 

CODE 

FiiEQUEUCX 

0 

9 

HE 

14 

3 

14 

u A I L> 

220 

1 

G 

THAT 

342 

1 

9 

tlAHX 

10 

4 

U 

4 14 

14 

BATE 

220 

2 

0 

THAT 

342 

2 

10 

IT 

13 

2 

w 

2 31 

11 

IB 

2 39 

3 

12 

HUT 

273 

3 

(1 

BLUE 

281 

3 

I 

I 


4 


Po  b 


109 


is  • t\  U VLu  At>  4>  L L t'i  Li  Y HIE  CAi  • 


TARGET 

LID  J 

CODE 

FPEQUEHCY 

0 

2 

TO  UK  A 

17 

3 

3 

IE 

b 5 

b 

4 

HOT 

y l 

b 

5 

EE  EH 

102 

b 

0 

BY 

130 

b 

CJ 

3 

310 

7 

A 

13  0 

3 

U 

CAT 

103 

5 

u) 

2 

210 

ll API’  JAE 

HELPED  BY  A 

iiOJ  . 

a AH  GET 

ELD  J 

CODE 

FPEQUEHCY 

0 

9 

hAHY 

10 

b 

u 

b 

14 

3 

r , 
a 

210 

1 

b 

EEEH 

7 0 

1 

0 

BY 

130 

0 

(J 

4 

310 

7 

A 

130 

9 

3 

MAH 

107 

b 

/ ^ HOY 

GREEN  . 

a At:  GET 

o LCJ 

CODE 

FPEQUEHCY 

0 

1 0 

IT 

lb 

3 

LJ 

3 

31 

11 

IE 

2 30 

4 

12 

NOT 

27  b 

4 

3 

BLUE 

20  1 

4 

LOME  CAT 

I 

E u EE  11  BY 

yl  (7.T  PL 

TAP  CUT 

ELD  j 

CODE 

FREQUENCY 

0 

1 

/I 

1 

3 

2 

67/ 

30 

2 

U) 

4 

1 7 

3 

IE 

bb 

0 

(J 

2 

210 

b 

EEEH 

70 

2 

0 

BY 

130 

7 

LJ 

5 

310 

7 

A 

130 

10 

0 

MAH 

107 

0 

16' 

HUT  1111 PT  BY 

THE  BOY  . 

TAP  CUT 

ELD  J 

CODE 

FPEQUEHCY 

u 

2 

TO  UK  A 

17 

5 

3 

IE 

5b 

7 

(J 

3 

210 

4 

HOT 

01 

0 

5 

EEEH 

102 

G 

LJ 

3 

7. 

3 

BY 

130 

8 

CJ 

0 

310 

7 

A 

139 

11 

3 

MAH 

107 

7 

PG  4 


110 


13.  MabY  DlGLIKLu  ±111,  ►/  OMA  N 


TAHGET 

u'LflJ 

CUBE 

FREQUENCY 

0 

'J 

MANY 

1 G 

G 

CJ 

G 

14 

0 

uIKEG 

21b 

4 

7 

A 

13  j 

12 

U 

liAtl 

1 G 7 

8 

THE  J HO T 

TED  til TTEU  h/ Au  JL'EIJ  BT  'THE  DOG  • 

TAE  G ET 

G ID  1 

CULL 

FREQUENCY 

0 

1 

A 

1 

4 

15 

u EG TTEU 

20 

1 

2 

CAT 

352 

1 

3 

IG 

55 

b 

u 

4 

2 1 G 

5 

G EEtl 

7 9 

4 

G 

BY 

130 

9 

(JJ 

5 

218 

7 

A 

139 

13 

U 

GAT 

1Gb 

G 

GO 

3 

2 19 

THE  DEGK 

IG  Blue  and 

THE 

YOUNG  MAN  GINGG 

TAG  GET 

GLD1 

CODE 

FREQU ENC Y 

U 

X 

A 

1 

5 

10 

TABLE 

31 

4 

11 

IG 

239 

5 

8 

BLUE 

25b 

1 

0 

AiW 

19  7 

5 

1 

A 

1 

G 

10 

TALL 

2 5 

1 

9 

MAN 

374 

X 

U 

GPEAKG 

219 

4 

THE  DUG 

IG  HELPED  BY 

GOME 

PUPPY  . 

TAHGET 

JL0J 

CODE 

FREQUENCY 

0 

A 

X 

A 

1 

7 

2 

CAT 

30 

3 

CJ 

«L 

3 52 

3 

IG 

55 

9 

CJ 

5 

2 1 G 

5 

GEE  11 

7 9 

5 

G 

BY 

1 3 G 

1 0 

u 

7 

310 

7 

A 

139 

14 

b 

CAT 

lob 

7 

UJ 

5 

219 

1 

■>» 

4 

1 1 

1 

>'  1 

17. 

X V h Ait 

(it  i E E N A ii  D ROVER  I J 

NOT  HELPED 

BY  LOME 

PUPPY  AND  HE  HA 

b /;or 

BEEN  BY  THE 

DUG  . 

m 

TARGET 

JLD1 

CO  DL 

FHEQdEIlCY 

0 

V 

1 0 

T ■ ' 

J.  X 

1 J 

4 

(J 

3 31 

11 

IE 

2 39 

u 

I 

0 

Bud  E 

2 30 

2 

u 

ALD 

197 

0 

2 

TUB  LA 

17 

6 

i 

3 

IL 

l>  J 

10 

u 

0 210 

1 

4 

EOT 

91 

7 

1) 

J EEL 

102 

7 

(J 

0 7 9 

0 

BY 

130 

11 

u) 

0 310 

7 

A 

13  9 

1 b 

0 

CAT 

100 

8 

LJ 

0 2 19 

0 

AND 

197 

7 

9 

HE 

14 

7 

3 

T ( • 

X.  >J 

210 

7 

4 

HOT 

91 

6 

3 

BLEU 

102 

B 

U) 

7 79 

0 

BY 

130 

12 

U) 

9 310 

7 

A 

139 

10 

0 

CAT 

100 

9 

(jj 

7 219 

i 

1 b . 

IT  HAB 

HUT  OHAHGE 

a At!  Gill 

L D J 

C'uDE 

FREQUENCY 

0 

10 

IT 

1 5 

5 

LJ 

o 31 

11 

I Li 

239 

7 

12 

HOT 

273 

5 

o 

B ud  E 

201 

3 

1 J • 

DOME  VALU  Ad  Lit  TABLE  HAB  HO 

T 

ORANGE  • 

T 

TAIlGE'l 

BID\ 

CODE 

FREQUENCY 

i 

0 

1 

A 

1 

0 

- 

17 

FILE 

27 

1 

10 

TAdLE 

399 

1 

11 

IB 

2 39 

0 

i 

12 

HOT 

273 

0 

is 

BLUE 

201 

0 

20  . 

I j.  lu  0 HA N G L JH ILL 

THE  GIRL 

IMMENSELY 

LIKED 

21  ZJOC  . 

T 

target 

BID  J 

CODE 

FREQUENCY 

0 

10 

IT 

13 

0 

LJ 

7 31 

T 

11 

IB 

2 39 

9 

1 

o 

BLUE 

230 

3 

0 

AEU 

197 

8 

1 

A 

1 

9 

! 

9 

n a a 

29 

1 

13 

IMMENSELY  223 

2 

0 

LI  KEG 

310 

10 

7 

A 

139 

17 

f 

9 

CAT 

10  0 

1 0 

LJ 

0 219 

r 
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In  the  followin''  results,  we  report  on  an  experiment  carried 
il  as  follows:  ev<  ry  word  is  established  as  the  word  class 
: :■  totype  of  its  own  class . F >r  th(  grammars  • nsi  iered, 
this  establishes  NW  classes.  The  state  discovery  is  carried  out 
successfully  as  shown  below.  The  space  complexity  of  the  i nit  la] 
string  code  list  is  of  course  increased. 


'T 


I 

I 

1 


A 


I 

I 


Vic*-  I 3 2 113 

'TABULA 

EG  4 


3 . 


THL  DUC  ID  UU'T  DEED  BY  THE  KITTEN  AND  MARY  NAG  NO'i  GEEN  BY  Til 
E KITTEN  . 


TAN  GET 

ELDJ 

CODE 

FRLQUEUCi 

0 

» 

THE 

2 

1 

2 

DOG 

74 

1 

3 

IG 

131 

1 

4 

NOT 

207 

i 

5 

GEER 

237 

i 

G 

BY 

310 

i 

7 

THE 

314 

i 

8 

KITTEN 

3b  3 

i 

0 

AND 

432 

1 

2. 

l-l  ARY 

41 

1 

3 

N AO 

132 

1 

4 

not 

207 

2 

5 

GEER 

237 

'j 

4. 

G 

BY 

310 

2 

7 

THE 

314 

2 

8 

KITTEN 

383 

2 

/•Mifl  LI  KEG 

THE  KITTEN 

. 

TARGET 

G L D J 

CODE 

EREnU ency 

0 

y 

MARY 

4 1 

1 

G 

LI  KEG 

300 

1 

7 

THE 

314 

3 

0 

KITTEN 

383 

3 

/ / >2/1  o NOT 

BLUE  . 

TARGET 

GID] 

CODE 

ER ENG ENCY 

0 

10 

IT 

40 

1 

11 

NAG 

548 

X 

12 

NOT 

o 2 3 

1 

8 

BLUE 

8 3 7 

1 

/•//!/<  Y VIOLENTLY  D I J LI  KEG  THE 

TARGET 

G\DI 

CODE 

FREQUENCY 

0 

y 

I-IARY 

41 

2 

13 

VIOLENTLY 

514 

1 

G 

DIG  LI  KEG 

70  ‘J 

1 

7 

THE 

314 

4 

0 

NO  MAN 

382 

1 

11 


PJ 


4 


fj.  iiOVEil  IJ  IJoT  HELPED  BY  TllL  HOY  WHILE  tiOVEti  *AJ  EOT  BEEN  HI 


ML  tfOMAE  • 


TAR  C ET 

J L D j 

CODE 

FREQU  EN CY 

0 

« 

/io 

44 

1 

3 

* 

131 

2 

b) 

2 

132 

4 

NUT 

207 

3 

8 

ii  L L,  RED 

2 3 J 

1 

0 

ax 

310 

3 

Lu 

2 

800 

7 

THE 

314 

8 

u 

box 

3 8 1 

1 

0 

dill  LL 

4 8 3 

1 

2 

ROVER 

44 

2 

3 

* AG 

132 

3 

4 

No  2 

20  7 

4 

5 

^ L ED 

237 

3 

D 

2 

2 39 

0 

bX 

310 

4 

U) 

2 

70  9 

7 

u Oil  E 

31  8 

1 

8 

WO  UAH 

382 

2 

JT  IS  NO T 0 RANGE  . 

TARGE'!' 

Jib  J 

CODE 

FREQUENCY 

0 

10 

IT 

40 

2 

11 

IJ 

847 

1 

12 

NOT 

G 2 3 

2 

8 

ORANGE 

038 

1 

J ti  E CLAlilG 

THAT  HE 

LIKES  GOME  MAN  . 

TARGET 

DID J 

CoDL 

FREQUENCY 

0 

Q 

G"E 

39 

1 

14 

CL A IMG 

GIG 

1 

0 

THAT 

777 

1 

9 

HE 

38 

1 

G 

LI  K EG 

800 

3 

7 

ROUE 

318 

2 

8 

NAN 

380 

1 

MArY  GEE AEG 

AND  TUOKA  IJ  NUT  GLEN  BY  THE 

DOG  . 

TARGET 

LID  J 

CODE 

FREQUENCY 

0 

9 

M ARY 

41 

3 

U) 

? 

3ft 

8 

G PEAKS 

802 

1 

0 

AND 

482 

2 

2 

TOUKA 

43 

1 

3 

IG 

131 

3 

LJ 

4 

132 

4 

NOT 

20  7 

8 

8 

SEEN 

237 

4 

(J 

3 

239 

G 

BY 

31  0 

8 

U) 

3 

709 

7 

THE 

314 

G 

(jj 

3 

318 

8 

DOG 

38  G 

1 

IIL  WAG  NOT 

HELPED  UP 

Lj  U M E 

WOMAN  . 

TARGET 

GID} 

CODE 

FREQUENCY 

0 

a 

u H L 

39 

4. 

3 

trl  Ai> 

4 9G 

1 

4 

hot 

207 

G 

b 

HELPED 

2 39 

4 

G 

HP 

310 

G 

GJ 

4 

7 0 9 

7 

G Oil  E 

31  b 

4 

0 

WOO  /}  A 

3 G 2 

3 

HE  DIG  LI  KEG  A CAT  Ail 

D A MAN  IG  NUT  HELPED 

ux 

GOME  CAT 

E G PUTTED 

dug  ig  glen  hp 

A GIRL  • 

I All  GET 

GLD  J 

CODE 

FREQUENCY 

0 

9 

HE 

38 

3 

G 

DI  u lj  1 A £ 

bOl 

1 

7 

i4 

31  3 

1 

8 

CAT 

3 84 

1 

0 

AND 

4b2 

3 

1 

A 

1 

1 

2 

MAN 

GG 

1 

3 

I U 

131 

4 

LJ 

2 

490 

4 

NUT 

207 

7 

b 

HELPED 

239 

b 

G 

UP 

310 

7 

GJ 

4 

500 

7 

U Oil  E 

31b 

b 

8 

CAT 

3G4 

2 

0 

AND 

4b2 

4 

1 

THE 

'> 

4. 

2 

GJ 

2 

1 

lb 

G PU  TT  ED 

GO 

1 

2 

DUG 

80  2 

1 

3 

IG 

131 

b 

GJ 

3 

49G 

b 

GLEN 

1Gb 

1 

G 

HP 

310 

G 

GJ 

2 

bOl 

7 

A 

313 

2 

0 

GIIiL 

38  3 

1 

mary  lb’  b'L/;;;  zjy  ^ome 

CAT  , 

AND  TOUKA  IG  NOT 

HURT 

HP  THE 

• 

TAG  GET 

GID\ 

CODE 

FREQUENCY 

0 

9 

MAH  P 

41 

4 

u 

4 

3G 

3 

IG 

49b 

1 

b 

GEEN 

18b 

2 

G 

HP 

310 

9 

GJ 

b 

bOO 

7 

GOME 

31  b 

G 

G 

CAT 

384 

3 

0 

aND 

4b2 

b 

2 

TOJKA 

43 

2 

3 

IG 

131 

G 

GJ 

4 

4 90 

4 

NUT 

207 

8 

b 

HUNT 

23b 

1 

G 

HP 

310 

10 

GJ 

b 

709 

7 

THE 

314 

7 

GJ 

3 

313 

8 

GUM  AN 

3 G 2 

4 

AND  T 


WOMAN 


PG  4 


1 16 


uUiJE  VAED/t 

JL£  CiiAli i / 

L BEUE  itND  1-IAtiY 

D I kj  Ll  L K>j  u 

out:  a 

T At i 0 L T 

l i^J 

CODE 

F tiL^U  LUCY 

0 

1 

Gut  IE 

3 

1 

10 

2 m EG  AG  L l 

0 4 

l 

1 0 

LllAl  H 

007 

1 1 

I G 

047 

2 

u 

BED  E 

Ob  5 

X 

0 

AND 

402 

0 

'J 

II  mil 

4 1 

5 

u 5 

38 

0 

DIG  El  a 

501 

3 

7 

GOME 

310 

7 

u 

G1  HE 

383 

2 

^ '//  GllOtiT 

Gui.  u tJ  E A l kj 

TAHGE'T 

Gl  D J 

Cu  DE 

FHEQUENCY 

0 

1 

THE 

2 

3 

4J  2 

3 

1 7 

GHOHT 

08 

1 

9 

BUI 

901 

1 

0 

kjEEAKG 

00  2 

2 

a l i u no  i 

‘ HElPED  by 

A KITTEN  AND  HE 

u'i47o  27//wn 

TOOK  A 

HELPED  BY  GO  Ml  CAT 

, 

I Ail  GET 

k>lDi 

CODE 

FREQUENCY 

0 

9 

GHE 

3 9 

3 

3 

IG 

4 9 0 

2 

4 

NOT 

207 

9 

5 

HlEPED 

239 

0 

0 

Bl 

310 

11 

oj  4 

001 

7 

A 

31  3 

4 

0 

KITTEN 

38  5 

4 

CJ  2 

58  0 

0 

AND 

4 02 

7 

9 

HE 

38 

0 

14 

>,AYG 

515 

1 

0 

THAT 

777 

2 

2 

TOOK  A 

43 

3 

3 

N A u 

132 

0 

4 

NOT 

207 

10 

0 

BEEPED 

2 39 

7 

0 

BY 

310 

12 

u 5 

001 

7 

GO  ME 

31  0 

8 

0 

CAT 

384 

4 

HE  N A G NOT 

TEEN  BY  THE  DOG 

T/ui  GET 

ulD  J 

CODE 

FHEoUENCY 

0 

9 

HE 

38 

7 

3 

U AG 

490 

0 

4 

NOT 

207 

11 

0 

BEEN 

237 

5 

«;  2 

238 

0 

BY 

310 

13 

id  0 

001 

7 

THE 

314 

8 

Cu 

313 

0 

DUG 

380 

2 

PS  5 


117 


A TABLE 

* Au  GRLLN 

• 

T AR  G L u. 

J [ D J 

CO  DE 

FREQUENCY 

0 

1 

A 

1 

3 

10 

1 A RLE 

7 o 

1 

11 

ffJ  t\  it 

540 

o 

X 

(xi 

3 

54  7 

u 

GR  LEU 

507 

1 

ROVER  UnS  Hurt!  BY 

jure  dog  . 

TAB  Lr  LT 

SlD  J 

CODE 

FREQUERCY 

0 

2 

ROVER 

44 

3 

3 

UAL 

132 

b 

5 

HURT 

1 Ob 

1 

0 

BY 

31  0 

14 

u 

6 

709 

7 

GOME 

31  5 

9 

0 

DOG 

30b 

3 

lx  I J GREEN  • 

TARGET 

LID  J 

CO  DE 

FREQUERCY 

0 

10 

IT 

40 

3 

id 

2 

7b 

1 1 

IS 

547 

4 

0 

GEL  E It 

5 0 7 

2 

J Gil  It  I S 

ROT  HURT 

by  some  boy  . 

TARGET 

SLD  J 

CODE 

FREQUERCY 

0 

2 

JO  HR 

42 

1 

3 

IS 

131 

7 

(J 

0 

4 90 

4 

ROT 

207 

12 

3 

HURT 

23b 

3 

0 

HI 

310 

1 5 

0) 

7 

501 

7 

SOME 

31  5 

10 

0 

BOY 

301 

2 

ROVER  IS 

HELPED  BY 

THE  BOY 

AND  ROVER 

HAS  SEEN 

BY 

THE  CAT  . 

TARGET 

o 

SID\ 

CODE 

FREQUENCY 

u 

2 

RO  VER 

44 

4 

3 

IS 

131 

0 

id 

7 

132 

5 

HELPED 

107 

1 

0 

BY 

310 

lb 

id 

0 

501 

7 

THE 

314 

9 

(J 

b 

313 

0 

BOY 

301 

3 

0 

ARD 

452 

0 

2 

RO  VER 

44 

5 

3 

0 AS 

132 

0 

- 

5 

SEER 

105 

3 

b 

BY 

310 

17 

id 

9 

501 

7 

THE 

314 

10 

U 

7 

313 

0 

CAT 

304 

5 

r 

i 


118 

PO  5 

21.  IT  10  ORAN CE 

TAt\  GET  ^ID] 

CODE 

F RL^U EUCY 

0 

10 

IT 

40 

4 

11 

10 

54  7 

5 

0 

0 RA  N C E 

5 b 0 

A. 

22.  IT  IJ  NOV 
i' AH  GL  a 

BLUE  . 
OlD\ 

CODE 

F R lQJ  b !J  C Y 

0 

10 

IT 

40 

5 

11 

10 

547 

G 

12 

NOT 

6 2 3 

3 

0 

BLUE 

G 3 7 

o 

4L 

23.  IT  10  NOT 
TARCET 

BLUE  . 
OVD  \ 

CODE 

FREQUENCY 

0 

10 

IT 

40 

0 

11 

1 0 

547 

7 

12 

NOV 

623 

4 

b 

BLUE 

o 3 7 

3 

24.  ROVER  UAL  NOV  HELPED  BY  VUE  CAT 


VAR1  C L T 

0[D] 

CODE 

FREQUENCY 

0 

2 

ROVER 

44 

G 

3 

UAL 

132 

9 

4 

NOT 

20  7 

13 

5 

HELPED 

239 

8 

G 

BY 

310 

18 

CJ 

7 70  9 

7 

THE 

314 

11 

CJ 

8 313 

8 

CAT 

384 

G 

25.  THE  DOC  UAL  NOT  HUNT  BY  THE  BOY  . 
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We  n t<  that  al  I th<  stat<  a are  discovered  at  sentence  13.  The 
"•  arlier"  discovery  in  this  case  (compared  to  the  results  in 
section  7)  is  due  to  the  altered  sequence  of  random  numbers  which 
drives  the  sentence  generator  REWRITE.  That  is,  sentence  6 in 
this  section  is  not  the  same  as  sentence  six  in  section  7. 


Figure  6.30.  The  discovered  automaton. 
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Appendix  A 

We  collect  together  In  this  section  descriptions  of  the 
subalgorithms  referred  to  in  the  previous  chapters,  a 
descriptive  list  of  the  data  elements,  and  hierarchica 1 communica- 
tion diagrams  for  the  subalgorithms. 
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is  an  array  of  characters  of  the  words  In  the  vocabulary. 

is  the  number  of  characters  in  each  word  in  WORDS. 

is  an  array  used  by  REWRITE  to  generate  sentences  in  the 
language.  The  first  column  is  a list  of  the  cumulative 
transition  probabilities  for  each  variable  being 
rewritten.  Across  a row  in  RULES,  the  rule  being  applied 
for  the  variable  is  rewritten  as  the  list  in  column 
three  and  four  if  the  entry  column  two  is  2;  in  the  case 
of  a terminating  rule  column  two  contains  a 1,  and  only 
the  word  in  column  three  is  "catenated"  to  the  list  of 
word  pointers.  In  this  latter  case,  column  four  contains 
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is  a list  of  the  number  of  rewrite  rules  available  for 
each  variable. 

is  0,  NRULES. 

is  an  array  which  is  used  by  ACCEPT  to  determine  th* 
grammat ica 1 ity  of  strings.  For  each  word  (first, 
second ,..., last ) in  the  language,  the  corresponding 
column  in  the  array  MATRIX  (i.e.,  first,  second, ... , last) 
is  the  list  of  "next  states"  for  that  word  from  the 
"current  state"  which  is  given  by  the  row. 

is  the  list  of  word  pointers. 

is  a list  of  pointers  to  the  prototypes  (as  they  are 
discovered)  in  the  list  C. 

counts  the  number  of  sentences  processed  by  PW;  it  is 
initialized  to  zero  by  BIRTH. 
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in  the  list  VC;  it  tabulates  the  number  of  occurrences 
since  classification  to  a code's  current  state. 

is  a sentence  counter  for  number  of  sentences  processed 
by  PS;  it  is  initialized  by  TABULA. 
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Appendix  L 


The  A PL  programs  lisMd  in  Figure  B.  1 were  usM  M <-arry  out  th< 
:ai  • at  i 1 t j rtf  i J ;raphi  / lep  icted  Ln  Chapter  f . ■ ■ . 
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